Article | October 10, 2024

Improving metabolite identification in nontargeted metabolomic and lipidomic studies

Nontargeted metabolomic and lipidomic studies hold tremendous potential to discover novel biomarkers that can elucidate complex biological mechanisms, inform disease subtyping, and identify patients most likely to respond to therapy. Next-generation bioanalytical technologies are rapidly expanding the number of small molecules that can be measured in a single sample, capturing thousands of compounds in heterogeneous matrices ranging from plasma, urine, and stool to more specialized biofluids like cerebrospinal fluid (CSF), tears, and breastmilk.

Such methods are capable of simultaneous measure of the endogenous metabolome and lipidome as well as the exposome, which comprises the entire small molecule population originating from the microbiome, dietary and lifestyle habits, as well as environmental exposures. This has allowed the field to advance beyond targeted studies of only known, well-characterized small molecules to probe the broader metabolome, lipidome, and exposome for the most biologically relevant signals that influence human biology and diseases.

The challenge is that most of these thousands of compounds have yet to be identified, meaning their chemical structures are still unknown. While an ‘unknown’ small molecule can be just as easily tracked from patient to patient as a known small molecule and therefore can be found to have clear biological significance in a study, accurate metabolite identification is needed to chemically fingerprint the molecule and draw meaningful mechanistic biochemical insight into its role in biological and/or disease processes as well as allow for validation of the observation.

Herein we discuss approaches to address and overcome the inherent complexities of metabolite identification, which can enhance the reliability and interpretability of nontargeted metabolomics and lipidomics data for drug development.

Mapping metabolites to mechanisms to maximize the impact of small molecule biomarker discoveries

Improving metabolite identification is crucial for maximizing the impact of nontargeted metabolomics and lipidomics data in therapeutic development, disease diagnostics, and personalized medicine. Accurate identification is needed to effectively map a metabolite or lipid biomarker within specific pathways and to understand its involvement in underlying mechanisms of disease and drug response, as well as to aid future translation into clinical applications. Obtaining these identifications has become more complex as the scope of nontargeted discovery screenings has expanded, allowing for simultaneous measurement of more chemical signals across wider dynamic ranges in concentration.

Technical advancements in discovery tools such as liquid chromatography-mass spectrometry (LC-MS) have significantly increased the number of small molecules that can be assayed in a sample, from measuring dozens of well-characterized metabolites and lipids to now capturing thousands across diverse chemical classes. While many of these small molecules are consistently and reproducibly measured in human blood, a large portion have yet to be assigned a chemical identity.

To truly benefit from small molecule biomarkers, we must refine our identification methods to confirm metabolites and lipids with certainty, paving the way for their use in healthcare.

Technical advancements in discovery tools such as liquid chromatography-mass spectrometry (LC-MS) have significantly increased the number of small molecules that can be assayed in a sample, from measuring dozens of well-characterized metabolites and lipids to now capturing thousands across diverse chemical classes. While many of these small molecules are consistently and reproducibly measured in human blood, a large portion have yet to be assigned a chemical identity.

To truly benefit from small molecule biomarkers, we must refine our identification methods to confirm metabolites and lipids with certainty, paving the way for their use in healthcare.

Key challenges for metabolite identification

Metabolites and lipids can be produced endogenously in human systems or can be introduced into the body by exogenous exposures. Small molecules are therefore quite diverse, with origins spanning from genetics to environment to microbes and encompassing a wide range of chemical structures that make comprehensive metabolite identification complex. Additionally, many metabolites exist as isomers with identical masses but different structures, further complicating the ability to distinguish unique identities.

A variety of evidence must be collected to accurately identify an unknown metabolite or lipid of interest, and confidence levels for identification vary depending on the amount of data that can be compiled. The Metabolomics Standards Initiative (MSI) has established various levels or tiers for reporting metabolite identifications based on industry-accepted data thresholds.

metabolite profiling for tier 1 identification

Table 1. MSI reporting structure for metabolite identification.

Level / Tier 1 metabolite and lipid identities can be confirmed with the highest confidence as their spectral features replicate those of an established chemical standard when assayed via the same method. It is important to note that the metabolite and standard must be measured using the same methodologies and workflows. It can be a time and resource intensive process to acquire and run large sets of reference standards that could potentially match with a molecule of interest.

For molecules without an existing reference standard, tandem mass spectrometry (MS/MS or MS2) fragmentation data can enable confident Tier 2 annotation. Fragmentation patterns can provide greater structural detail and help differentiate compounds with the same m/z value.

Some small molecules, however, have very stable chemical structures that under certain analysis conditions can resist ionization and fragmentation, leading to poor MS2 performance. Additionally, many small molecules produce identical fragments that are not useful to discriminate between compounds that may be structurally similar. Both scenarios necessitate chemical reference standards to confirm the compound’s identity.

Approaches to accelerate and enhance metabolite identification

Bioanalytical technologies that are expanding the scope of molecules measured in nontargeted metabolomic and lipidomic studies can also be applied to further efforts in metabolite identification. Herein we describe the use of Sapient’s rapid liquid chromatography-mass spectrometry (rLC-MS) systems to compile a metabolite identification database with multi-parameter data from thousands of chemical standards that enables rapid molecule matching and from which identifications that meet Tier 1 reporting criteria can be made.

The rLC-MS system is a high throughput instrument capable of capturing >15,000 chromatographic features in a single biosample. Utilizing this system, we collected m/z and retention time measurements for more than 10,000 chemical standards to enhance compound identification efforts. Standards were selected with a focus on those metabolites and lipids that are central to human biology and disease and that are routinely captured in human biosamples, including endogenous human metabolites and lipids, plant- and microbiome-derived small molecules, as well as molecules originating from food and food additives, FDA approved drugs, environmental toxicants, and pollutants (Figure 1).

metabolite profiling for compound identification

Figure 1. (A) Chemical superfamilies represented in the reference standards measured via rLC-MS. (B) Chemical space plot visualizing the compound structures of the reference standards measured via rLC-MS. Clusters represent groupings of structurally related compounds.

Utilizing the speed and sensitivity of the Bruker timsTOF Pro 2 parallel accumulation serial fragmentation (PASEF®) technology, MS2 and collisional cross section (CCS) information was collected at multiple collision energies in both positive and negative ion modes for all observed adducts. By isolating each parent signal in the m/z and CCS dimensions, high quality de-noised MS2 can be obtained at collection speeds reaching up to 100Hz which allows for good MS2 depth of coverage despite the very fast time scales of the rLC chromatography system where peak widths are on average about 1 second. Bringing together these datasets, we can leverage multiple orthogonal chemical parameters including reference retention time, m/z, and MS2 chemical fingerprint to allow for MSI Tier 1 assignment to over 1,000 metabolites and lipids routinely measured in human biosamples.

This metabolite identification database of extensive multi-parameter chemical data can now be leveraged to rapidly match small molecule biomarkers found in human and non-human studies across a large set of highly relevant reference standards to maximize likelihood of definitive identifications and in turn, improve the biomarker’s translatability.  The data can also be leveraged in cases where a molecule of interest is not an exact match in the database, to identify structural similarities with existing compounds that can classify the molecule into a certain chemical class or identify spectra similarity and support Tier 2 annotation.

Conclusion

The goal of any nontargeted metabolomic or lipidomic study is to discover the most biologically important signals in a patient population, whether they are known molecules or yet to be mapped. Improvements in the speed, quality, and extent of metabolite identification are needed to unlock the insights that unknown but statistically significant metabolites and lipids hold. Confident identification allows for accurate mapping of the molecule in biological pathways to drive deeper or novel understanding of biochemical processes, disease mechanisms, and biomarkers.

New technologies are now enabling accelerated and enhanced processes for metabolite identification. We explored how next-generation rLC-MS systems have been able to rapidly generate retention time, m/z, and MS2 data at scale, across thousands of chemical reference standards, to maximize definitive identifications for metabolites and lipids found to be of interest in a study. These identifications meet the stringent Tier 1 identification criteria set forth by MSI to ensure the highest confidence metabolite identifications possible. Such innovations are generating more accurate, reproducible, and meaningful insights from nontargeted metabolomics and lipidomics data, ultimately accelerating the translation of metabolite and lipid biomarker discoveries into clinical applications that advance drug development and deployment.