Understanding the risk factors that give rise to human disease is essential to early detection and development of effective treatments for disease prevention. Human disease risk represents an interaction between underlying genetic predisposition, which is largely set from the moment of conception, and the varied and changing exposures that occur over an individual’s lifetime, including from diet, lifestyle, the environment, internal organs, and the microbiome.

The degree of risk that originates from genetic vs. non-genetic factors has been modeled using a variety of tools, including the study of monozygotic and dizygotic twins (Rappaport 2016). From these studies, it has been found that the majority of lifetime attributable disease risk is related to non-genetic causes, particularly for complex, heterogenous diseases such as diabetes.

non-genetic factors for diabetes prediction

Figure 1. Population attributable fractions for multiple disease phenotypes, including diabetes, estimated from studies of mono and dizygotic twins. Rappaport PLoS. 2016.

Small molecule measures: elucidating non-genetic disease risk

Effective mapping of disease risk across populations requires measurement of non-genetic factors which are not encoded in genetic sequence alone. These non-genetic factors can be captured in part through circulating small molecule biomarkers. Small molecules can be produced endogenously in human cells, in both healthy and diseased tissue, and cross cellular and biological barriers into central circulation. Small molecules are also introduced into blood from the world around us – what we eat, drink, smell, smoke, from the microbes that inhabit our gut, from external environmental exposures, and more. These molecules circulate throughout the body and cause effects over time, making them ideal biomarkers of disease risk and early disease detection.

The mapping of small molecule chemistries, often referred to as metabolomics, is not a new concept, yet only a portion of small molecule biomarkers have been characterized to date, and in general, across relatively small populations. The challenge has been in achieving analytical scale and enabling approaches that allow us to capture data from tens of thousands of individuals.

Measuring metabolic changes at scale and over time

For the last decade, Sapient’s team has been tackling the technical challenges that have limited our ability to measure diverse small molecule biomarkers across human populations. This work has resulted in the development and optimization of our rapid liquid chromatography-mass spectrometry (rLC-MS) systems: next-generation, proprietary technologies for nontargeted mass spectrometry analysis of human biosamples. These tools enable ultra-high throughput, agnostic profiling of thousands of circulating small molecule factors across multiple chemical domains. The vast majority of the biomarkers discovered via rLC-MS are uncharacterized, though they can be accurately quantified and statistically analyzed – amplifying discovery potential to a new order of magnitude.

In contract to genomics, circulating small molecule biomarkers are highly dynamic, and can fluctuate in response to changes in internal physiology or external exposures. To limit the potential for reverse causality, we leverage the discovery potential of longitudinal studies. These approaches sample a healthy individual’s blood and then follow that person over several decades as they move to a diabetic state. In assaying the ‘pre-disease’ blood samples, we can identify specific biomarkers that reveal individuals who will go on to develop diabetes years prior to disease onset. Early studies by other groups in relatively small populations with relatively few measures have demonstrated the potential for these longitudinal discoveries (T. Wang et. al.); Sapient’s approaches aims to greatly expand upon and extend these landmark studies through the measure of tens of thousands of circulating biomarkers across tens of thousands of individuals.

A longitudinal study of tens of thousands of individuals: predicting diabetes by looking back

Figure 2 below represents an rLC-MS analysis of over 40,000 circulating small molecule factors in tens of thousands of individuals from many different studies collected from dozens of sites around the world. These individuals have diverse socioeconomic backgrounds, diets, and lifestyles, and have been followed for up to two decades years.

metabolic changes and metabolites associated

Figure 2. Across a study of tens of thousands of diverse individuals, we find hundreds of metabolites that associate with incident diabetes development >10 years in advance.

Through time-to-event analysis, we found hundreds of biomarkers present in ‘pre-disease’ states that predict the development of incident diabetes more than 10 years in advance. Each blue dot on the plot represents a small molecule that cross-validates across different studies and diverse populations, confirming the robust nature of the discovery and consistency in findings despite the heterogeneity of the underlying populations.

Machine learning: bridging the gap from data to insight

While it is now possible to generate large-scale small molecule biomarker data very rapidly, the ability to interpret that data for actionable insight and knowledge has not scaled at the same rate. Machine learning (ML) now enables the processing power required to unite voluminous, multi-dimensional data assets such as small molecule measures and genomics, and enable rapid, accurate classifier analysis to identify associations with incident disease development.

When we look long term at disease risk over a ten-year period, as shown in Figure 3, we find that ML-based classifiers using metabolic risk scores (MRS) have tremendous predictive power, clearly differentiating the top 25% of individuals at the highest risk of developing diabetes over a decade. It also allows us to understand how risk increases over time, and particularly how common risk factors such as obesity interact with MRS to influence disease risk.

metabolite risk score with machine learning

Figure 3. ML-based classification of long-term diabetes risk and impact of BMI on MRS to influence disease risk.

ML-aided diabetes risk assessment with small molecule biomarkers

Metabolic diseases such as diabetes typically manifest with significant interpersonal variability, causing challenges for disease prediction, prevention, diagnosis, and treatment. Small molecule biomarkers can help to identify the sources of disease heterogeneity as readouts of the unique exposures an individual experiences throughout life. We now have the analytical technologies to efficiently probe this largely yet characterized chemical space in humans, and machine learning will enable us to take these large datasets and derive actionable knowledge from the information – including by identifying early biomarkers of disease.

Mapping Metabolic Changes for Diabetes Prediction with Machine Learning

Understanding the risk factors that give rise to human disease is essential to early detection and development of effective treatments for disease prevention. Human disease risk represents an interaction between underlying genetic predisposition, which is largely set from the moment of conception, and the varied and changing exposures that occur… Read More

Data ≠ Insight: Improving Metabolomics Data Interpretation

Untargeted biomarker discovery aims to uncover previously unknown factors that associate with biologically relevant changes in human health and disease, such as those related to disease progression or treatment response. Most diseases, while singularly defined pathologically, actually represent diverse groupings of contributing factors and biological pathways. Identifying novel biomarkers can… Read More

On the Mark? Identifying Biomarkers of Target Engagement

Retrospective analyses conducted by major pharmaceutical companies on their drug pipelines have revealed that close to one-fifth of Phase II failures due to efficacy did not conclusively demonstrate adequate target exposure. This statistic emphasizes the importance of identifying target engagement biomarkers, which report on drug-target interactions… Read More

Comparison of Omics Techniques and Biomarker Types

High throughput technologies can now capture and measure a huge number of biological molecules within a single human cell or tissue, enabling clearer and more complete views of underlying biology and supporting development of molecularly targeted drug therapies. These molecules are found at the gene, protein, and metabolic level, giving… Read More

Getting Drug Development Right

The omics revolution has enabled many breakthroughs in our understanding of disease biology, but the productivity of clinical drug development remains historically low. In 2020, the composite success rate of drugs across all therapy areas was only 9.8% – even lower than the 10-year average of 12.9%. Read More