Nontargeted LC-MS can assay thousands of chemical entities in a single biospecimen, but in that crush of data, how do you isolate true spectral features from the noise? This paper, contributed to by Sapient’s scientists, describes machine learning-based approaches found to remove up to 90% of false peaks without reducing true positive signals, with excellent reproducibility across multiple data sets.
Abstract
Liquid chromatography–mass spectrometry (LC-MS)-based metabolomics has emerged as a valuable tool for biological discovery, capable of assaying thousands of diverse chemical entities in a single biospecimen. Processing of nontargeted LC-MS spectral data requires identification and isolation of true spectral features from the random, false noise peaks that comprise a significant portion of total signals, using inexact peak selection algorithms and time-consuming visual inspection of data. To increase the fidelity and speed of data processing, herein we establish, optimize, and evaluate a machine learning pipeline employing deep neural networks as well as a simpler multiple logistic regression model for classification of spectral features from nontargeted LC-MS metabolomics data. Machine learning-based approaches were found to remove up to 90% of false peaks from complex nontargeted LC-MS data sets without reducing true positive signals and exhibit excellent reproducibility across multiple data sets. Application of machine learning for nontargeted LC-MS-based peak selection provides for robust and scalable peak classification and data filtering, enabling handling and processing of large scale, complex metabolomics data sets.

Deep Neural Networks for Classification of LC-MS Spectral Peaks
Nontargeted LC-MS can assay thousands of chemical entities in a single biospecimen, but in that crush of data, how do you isolate true spectral features from the noise? This paper, contributed to by Sapient’s scientists, describes machine learning-based approaches found to remove up to 90% of false peaks without reducing… Read More