Deep Neural Networks for Classification of LC-MS Spectral Peaks
Nontargeted LC-MS can assay thousands of chemical entities in a single biospecimen, but in that crush of data, how do you isolate true spectral features from the noise? This paper, contributed to by Sapient’s scientists, describes machine learning-based approaches found to remove up to 90% of false peaks without reducing true positive signals, with excellent reproducibility across multiple data sets.
Liquid chromatography–mass spectrometry (LC-MS)-based metabolomics has emerged as a valuable tool for biological discovery, capable of assaying thousands of diverse chemical entities in a single biospecimen. Processing of nontargeted LC-MS spectral data requires identification and isolation of true spectral features from the random, false noise peaks that comprise a significant portion of total signals, using inexact peak selection algorithms and time-consuming visual inspection of data. To increase the fidelity and speed of data processing, herein we establish, optimize, and evaluate a machine learning pipeline employing deep neural networks as well as a simpler multiple logistic regression model for classification of spectral features from nontargeted LC-MS metabolomics data. Machine learning-based approaches were found to remove up to 90% of false peaks from complex nontargeted LC-MS data sets without reducing true positive signals and exhibit excellent reproducibility across multiple data sets. Application of machine learning for nontargeted LC-MS-based peak selection provides for robust and scalable peak classification and data filtering, enabling handling and processing of large scale, complex metabolomics data sets.