Feature detection is a critical step in the preprocessing of
liquid chromatograph
y鈥搈ass spectrometr
y (LC鈥揗S) metabolomics data. Currentl
y, the predominant approach is to detect features using noise filters and peak shape models based on the data at hand alone. Databases of known metabo
lites and historical data contain information that could help boost the sensitivit
y of feature detection, especiall
y for low-concentration metabo
lites. However, uti
lizing such information in targeted feature detection ma
y cause large number of false positives because of the high levels of noise in LC鈥揗S data. With high-resolution mass spectrometr
y such as
liquid chromatograph鈥揊ourier transform mass spectrometr
y (LC鈥揊TMS), high-confidence matching of peaks to known features is feasible. Here we describe a computational approach that serves two purposes. First it boosts feature detection sensitivit
y b
y using a h
ybrid procedure of both untargeted and targeted peak detection. New algorithms are designed to reduce the chance of false-positives b
y nonparametric local peak detection and filtering. Second, it can accumulate information on the concentration variation of metabo
lites over large number of samples, which can help find rare features and/or features with uncommon concentration in future studies. Information can be accumulated on features that are consistentl
y found in real data even before their identities are found. We demonstrate the value of the approach in a proof-of-concept stud
y. The method is implemented as part of the R package apLCMS at
y.edu/apLCMS/" class="extLink">http://www.sph.emory.edu/apLCMS/.
Keywords:
metabolomics; mass spectrometry; bioinformatics