Article | January 30, 2026

Mapping Human Metabolic Diversity with Foundation Models: A DynamiQ Approach

Modern drug development is fundamentally complicated because human biology is heterogeneous. Patients who share the same diagnosis, genetic mutation, or baseline laboratory values often respond very differently to the same therapy. They may experience divergent toxicities and progress along distinct disease trajectories with varied outcomes. These differences reflect variation in underlying physiological and metabolic state, which cannot be comprehensively captured by genomics, routine clinical measures, or single biomarkers alone. Unlocking such insights requires development of a metabolomics foundation model.

AI-enabled foundation models have emerged as a powerful approach to make more of this complex biology explainable by integrating large-scale, heterogeneous datasets to learn a general, reusable representation of human biological states. Rather than optimizing for a single endpoint, disease, or predefined outcome, a representative foundation model is trained on molecular, clinical, imaging, and real‑world data (RWD) collected from both healthy individuals and diverse patient populations – with different demographics, diagnoses, exposures, and outcomes – to learn the commonalities and distinctions that govern how biology behaves across people and over time. Once trained, the representation can be reused across therapeutic areas, programs, and stages of development.

Using large-scale, nontargeted metabolomics datasets, such a model can learn a map of human metabolic states: a coordinate system into which any biological sample can be placed and interpreted to discern the distinct metabolic phenotype, or “metabotype”, of an individual relative to the full spectrum of human physiology.

How it Works: Building a Representative Metabolomics Foundation Model

A foundation model is different from a traditional predictive model in that it is trained using self-supervised learning, rather than outcome-driven prediction. During training, the model learns to reconstruct masked metabolites, preserve co-variation structure, and maintain stable relationships across pathways and biochemical processes, mapping the intrinsic organization of human metabolic biology. Because the representation is learned from population‑scale metabolic structure rather than labeled outcomes, it is more robust to cohort shift, missing data, and sparsely observed phenotypes.

This allows the model to capture latent biological states that drive differential drug response and disease progression. The representation is learned once and reused across many downstream applications.

Sapient’s DynamiQ™ Dataset: Enabling True Metabolomics Foundation Modeling

Most metabolomics datasets are small, disease-specific, and sparsely phenotyped, making them insufficient for foundation models to learn generalizable representations of human metabolism. In contrast, Sapient’s DynamiQ plasma database was purpose-built to provide the scale, diversity, and depth of biological data necessary to model human metabolism across broad populations, rather than narrowly defined cohorts.

The dataset includes measurements of over 100,000 metabolites and lipids across more than 67,000 biosamples longitudinally collected from over 13,000 individuals, coupled with deep clinical phenotyping data from EHR records, including clinical outcomes. A wide range of ages, disease states, environmental exposures, treatments, and physiological conditions are represented. It captures not only well-annotated metabolites, but also thousands of unannotated or partially characterized molecular features that reflect real biochemical activity in humans. These features encode pathway behavior, metabolic flux, and physiologic stress responses that cannot be read out by genomics, transcriptomics, or proteomics alone, and that laboratory values cannot readily discern between patients.

Breadth of data is the defining requirement for building a foundation model that is transferable, robust, and biologically meaningful – and the comprehensive metabolomics data with linked RWD available in DynamiQ enables this broad learning of generalizable metabolic representations.

metabolomics foundation model data inputs

Sapient’s DynamiQ plasma database provides the scale, breadth, and depth of molecular measures needed to model human metabolism across broad populations.

Foundation Model Momentum: Human Biology Domains Already Transformed

Representative foundational models have already transformed other domains of human biology. Population-scale efforts such as development of the UK Biobank (UKBB) have enabled reusable representations of disease risk and physiology across hundreds of endpoints, supporting discovery well beyond the questions originally asked. In protein biology, the AlphaFold Protein Structure Database has demonstrated how a single, outcome-agnostic model could unlock insight across thousands of proteins, many of which had never been structurally characterized. Large-scale single-cell atlases have similarly produced foundation models of cell state that are now routinely used to interpret drug mechanism, resistance, and toxicity.

Human metabolic state mapping has lagged in this foundation model revolution not because it is less important, but because datasets of sufficient scale and depth have not yet existed. Sapient’s DynamiQ dataset closes this gap.

Therapeutic Area Impact: Where DynamiQ-Enabled Foundation Models Fit Best

A representative foundational model of metabolism is most valuable in therapeutic areas where current physiology strongly determines therapeutic response, toxicity, and persistence, and where traditional biomarkers explain only a fraction of observed heterogeneity in disease and/or drug response.

For example, in cardiometabolic disease – including obesity, type 2 diabetes, heart failure, and chronic kidney disease – metabolism is a core disease driver. Patients with identical diagnoses often occupy very different metabolic states, influencing weight loss, glycemic control, cardiovascular outcomes, and tolerability. A study across more than 6,900 individuals in Sapient’s DynamiQ database found that the individuals could be grouped into multiple “metabotype” clusters with striking and consistent differences across metabolic groups. For example, one metabotype showed a high risk of cardiovascular disorders, diabetes, and in particular chronic kidney disease (CKD), but was not significantly associated with traditional cardiovascular risk factors, including lipidemia or obesity. In fact, LDL-cholesterol was significantly lower in this subgroup. This shows how metabolic state mapping can reveal clinically meaningful risk phenotypes that are invisible to traditional diagnostic categories and standard biomarkers, enabling more precise patient stratification and a deeper understanding of disease mechanisms.

dynamiq metabolomics foundation model

Metabotypes obtained from cluster analysis of prevalent and highly variable features across 6,935 individuals in Sapient’s DynamiQ plasma database shows subpopulations with striking and consistent differences across the metabolic groups.

In oncology, metabolic state reflects both tumor-intrinsic biology and host systemic physiology. Response to targeted therapies, antibody-drug conjugates (ADCs), and immuno-oncology agents is shaped by energy availability, nutrient competition, mitochondrial function, inflammation, and cachexia. Metabolic state mapping provides physiologic explanations for responder versus non-responder populations, toxicity clustering, and early discontinuation in settings where genomics alone is insufficient.

In immunology and inflammatory disease, immune activation and suppression are tightly coupled to metabolic programming. In diseases such as rheumatoid arthritis, inflammatory bowel disease, and lupus, metabolic states distinguish patients likely to achieve durable response from those prone to relapse or intolerance, particularly in the setting of secondary non-response.

In central nervous system and neurodegenerative disease, metabolic dysfunction often precedes clinical symptoms and influences progression. Energy utilization, lipid metabolism, and systemic inflammation play critical roles in neuronal resilience, making metabolic state a valuable lens for understanding heterogeneity in CNS trials.

Clinical Trial Intelligence: Explaining and Improving Trials with Metabolic State

Clinical trial intelligence refers to the use of high-dimensional biological data and advanced modeling to explain trial outcomes, reduce uncertainty, and inform future trial design, without making patient-level treatment decisions. It is already embedded in translational medicine workflows through exploratory biomarkers, post-hoc analyses, and mechanistic interpretation in clinical study reports.

A representative metabolomics foundation model is particularly powerful in this context because metabolic state reflects current human physiology, including the integrated effect of disease, environment, comorbidities, and drug exposure. It can reveal the distinct metabotypes occupied by responders vs. non-responders vs. patients who experience adverse events, even when they appear similar by conventional clinical criteria.

Because the representation is learned independently of the trial and is not used for enrollment, stratification, or treatment assignment, it can be applied across Phase I-III programs without regulatory friction. The output is explanatory and strategic rather than prescriptive, making it suitable for both retrospective and blinded prospective analyses.

A DynamiQ Way to Decipher Disease Pathobiology & Drug Response

Metabolomics foundation modeling can uniquely capture human physiology in a way that fills the largest blind spot in drug development decision-making today. Sapient has established DynamiQ as a population-scale reference database of human metabolism that reflects real physiological diversity at a level not accessible through genomics or routine clinical measures.

By integrating deep, nontargeted metabolomics with longitudinal clinical phenotyping across tens of thousands of human samples, we can uncover meaningful physiological differences among patients who otherwise appear similar by diagnosis, baseline labs, or exposures – differences that can often be critical in influencing disease progression, therapeutic response, and drug tolerability and persistence.

By adding a physiology-first layer of interpretation alongside existing biomarkers and biostatistics, we help development teams explain heterogeneity, anticipate risk, and make more confident decisions about dose, design, and population selection in subsequent trials.

If you are interested in leveraging DynamiQ and metabolomics foundation models to better understand disease biology, patient heterogeneity, and therapeutic response, reach out to our team at discover@sapient.bio.