Understanding the risk factors that give rise to human disease is essential to early detection and development of effective treatments for disease prevention. Human disease risk represents an interaction between underlying genetic predisposition, which is largely set from the moment of conception, and the varied and changing exposures that occur over an individual’s lifetime, including from diet, lifestyle, the environment, internal organs, and the microbiome.
The degree of risk that originates from genetic vs. non-genetic factors has been modeled using a variety of tools, including the study of monozygotic and dizygotic twins (Rappaport 2016). From these studies, it has been found that the majority of lifetime attributable disease risk is related to non-genetic causes, particularly for complex, heterogenous diseases such as diabetes.

Figure 1. Population attributable fractions for multiple disease phenotypes, including diabetes, estimated from studies of mono and dizygotic twins. Rappaport PLoS. 2016.
Metabolic biomarker profiling: elucidating non-genetic disease risk
Effective mapping of disease risk across populations requires measurement of non-genetic factors which are not encoded in genetic sequence alone. These non-genetic factors can be captured in part through circulating small molecule biomarkers. Small molecules can be produced endogenously in human cells, in both healthy and diseased tissue, and cross cellular and biological barriers into central circulation. Small molecules are also introduced into blood from the world around us – what we eat, drink, smell, smoke, from the microbes that inhabit our gut, from external environmental exposures, and more. These molecules circulate throughout the body and cause effects over time, making them ideal biomarkers of disease risk and early disease detection.
The mapping of small molecule chemistries, often referred to as metabolomics, is not a new concept, yet only a portion of small molecule biomarkers have been characterized to date, and in general, across relatively small populations. The challenge has been in achieving analytical scale and enabling approaches that allow us to capture data from tens of thousands of individuals.
Measuring metabolic changes at scale and over time
For the last decade, Sapient’s team has been tackling the technical challenges that have limited our ability to measure diverse small molecule biomarkers across human populations. This work has resulted in the development and optimization of our rapid liquid chromatography-mass spectrometry (rLC-MS) systems: next-generation, proprietary technologies for nontargeted mass spectrometry analysis of human biosamples. These tools enable ultra-high throughput, agnostic profiling of thousands of circulating small molecule factors across multiple chemical domains. The vast majority of the biomarkers discovered via rLC-MS are uncharacterized, though they can be accurately quantified and statistically analyzed – amplifying discovery potential to a new order of magnitude.
In contract to genomics, circulating small molecule biomarkers are highly dynamic, and can fluctuate in response to changes in internal physiology or external exposures. To limit the potential for reverse causality, we leverage the discovery potential of longitudinal studies. These approaches sample a healthy individual’s blood and then follow that person over several decades as they move to a diabetic state. In assaying the ‘pre-disease’ blood samples, we can identify specific biomarkers that reveal individuals who will go on to develop diabetes years prior to disease onset. Early studies by other groups in relatively small populations with relatively few measures have demonstrated the potential for these longitudinal discoveries (T. Wang et. al.); Sapient’s approaches aims to greatly expand upon and extend these landmark studies through the measure of tens of thousands of circulating biomarkers across tens of thousands of individuals.
A longitudinal study of tens of thousands of individuals: predicting diabetes by looking back
Figure 2 below represents an rLC-MS analysis of over 40,000 circulating small molecule factors in tens of thousands of individuals from many different studies collected from dozens of sites around the world. These individuals have diverse socioeconomic backgrounds, diets, and lifestyles, and have been followed for up to two decades years.

Figure 2. Across a study of tens of thousands of diverse individuals, we find hundreds of metabolites that associate with incident diabetes development >10 years in advance.
Through time-to-event analysis, we found hundreds of biomarkers present in ‘pre-disease’ states that predict the development of incident diabetes more than 10 years in advance. Each blue dot on the plot represents a small molecule that cross-validates across different studies and diverse populations, confirming the robust nature of the discovery and consistency in findings despite the heterogeneity of the underlying populations.
Machine learning: bridging the gap from data to insight
While it is now possible to generate large-scale small molecule biomarker data very rapidly, the ability to interpret that data for actionable insight and knowledge has not scaled at the same rate. Machine learning (ML) now enables the processing power required to unite voluminous, multi-dimensional data assets such as small molecule measures and genomics, and enable rapid, accurate classifier analysis to identify associations with incident disease development.
When we look long term at disease risk over a ten-year period, as shown in Figure 3, we find that ML-based classifiers using metabolic risk scores (MRS) have tremendous predictive power, clearly differentiating the top 25% of individuals at the highest risk of developing diabetes over a decade. It also allows us to understand how risk increases over time, and particularly how common risk factors such as obesity interact with MRS to influence disease risk.

Figure 3. ML-based classification of long-term diabetes risk and impact of BMI on MRS to influence disease risk.
ML-aided diabetes risk assessment with small molecule biomarkers
Metabolic diseases such as diabetes typically manifest with significant interpersonal variability, causing challenges for disease prediction, prevention, diagnosis, and treatment. Small molecule biomarkers can help to identify the sources of disease heterogeneity as readouts of the unique exposures an individual experiences throughout life. We now have the analytical technologies to efficiently probe this largely yet characterized chemical space in humans, and machine learning will enable us to take these large datasets and derive actionable knowledge from the information – including by identifying early biomarkers of disease.

Mapping Metabolic Changes for Diabetes Prediction with Machine Learning
Understanding the risk factors that give rise to human disease is essential to early detection and development of effective treatments for disease prevention. Human disease risk represents an interaction between underlying genetic predisposition, which is largely set from the moment of conception, and the varied and changing exposures that occur… Read More

Thinking outside the brain to advance CNS drug development
CNS diseases can have a wide range of symptoms which can make them difficult to diagnose and treat based on clinical presentation alone. Biomarkers can provide objective measures of the underlying disease process, which can help to distinguish between different subtypes of the disorder so that targeted therapies can be… Read More

Be More Specific: Solving the Specificity Challenge in Metabolomics
Disease processes are complex. Individuals with the same genetic disorder, symptoms, and/or diagnosis can have varied prognosis and response to treatment. The mechanisms underlying disease can be quite different and may be influenced by a range of internal and external exposures experienced over time. Sensitive and… Read More

Fierce Pharma Feature: Small molecule biomarkers to align patients, disease, and therapy
Sapient recently contributed a piece to Fierce Pharma discussing the new biological and disease insights that small molecule biomarkers can bring to improve drug development success – and how next-generation mass spectrometry is enabling their discovery on an entirely new scale. Excerpt: Dynamic organ physiology, inter-organ communication, host-disease interactions, and… Read More

Data ≠ Insight: Improving Metabolomics Data Interpretation
Untargeted biomarker discovery aims to uncover previously unknown factors that associate with biologically relevant changes in human health and disease, such as those related to disease progression or treatment response. Most diseases, while singularly defined pathologically, actually represent diverse groupings of contributing factors and biological pathways. Identifying novel biomarkers can… Read More

On the Mark? Identifying Biomarkers of Target Engagement
Retrospective analyses conducted by major pharmaceutical companies on their drug pipelines have revealed that close to one-fifth of Phase II failures due to efficacy did not conclusively demonstrate adequate target exposure. This statistic emphasizes the importance of identifying target engagement biomarkers, which report on drug-target interactions… Read More

Comparison of Omics Techniques and Biomarker Types
High throughput technologies can now capture and measure a huge number of biological molecules within a single human cell or tissue, enabling clearer and more complete views of underlying biology and supporting development of molecularly targeted drug therapies. These molecules are found at the gene, protein, and metabolic level, giving… Read More

Biomarker-Driven Clinical Trials: Enriched with Biology, Evolved for a New Era
Why are large, expensive clinical trials so common? It stems from the challenge of needing to demonstrate treatment efficacy in enough of the patient population despite the fact that a majority of patients are likely to be non-responders. The FDA generally expects at least 35% of… Read More

Getting Drug Development Right
The omics revolution has enabled many breakthroughs in our understanding of disease biology, but the productivity of clinical drug development remains historically low. In 2020, the composite success rate of drugs across all therapy areas was only 9.8% – even lower than the 10-year average of 12.9%. Read More