Media | September 18, 2025

Bioanalysis Zone Feature: Scaling biomarker discovery—more data, deeper insights

Get the Resource

Read the Article

Biomarker discovery is changing alongside advances in multi-omics data generation and omics data analysis. Technological innovations are enabling the measurement of thousands of molecules per sample with unprecedented speed and resolution. AI and machine learning tools are allowing for that data to be more efficiently processed, and for discovered biomarkers to be more deeply characterized to inform better drug development decision making.

In this interview with Bioanalysis Zone, Sapient’s Co-Founders Dr. Jeramie Watrous (Head of Analytical R&D) and Dr. Tao Long (Head of Data Science) discuss bridging the gap between more data and deeper insights to scale biomarker-driven drug development.

Read an excerpt from the full feature below.

Once the data is generated, what’s the process for turning that raw information into insights or action?

Jeramie: Data processing is a critical bridge between data generation and omics data analysis for insight delivery. For metabolomics, we use Sapient’s proprietary software suite, which enables peak extraction and alignment across thousands of samples, as well as a metabolite identification pipeline that leverages our comprehensive, in-house standards library to identify known molecules captured. For proteomics, we’ve generated Sapient’s proprietary tissue-specific protein references and leverage the latest AI-based tools for spectral matching, FDR estimation, protein group quantification, and intensity normalization. From there, the analysis-ready data is handed off to Tao’s data science team for deeper interpretation.

Tao: Turning this processed multi-omics data into actionable insights relies on three foundational components: talent, infrastructure and tools. When it comes to talent, you really need a cross-functional team that can bring complementary expertise in biology, statistics, machine learning (ML) and data science, plus a strong understanding of laboratory processes. This allows us to interpret complex data in a meaningful way and solve for the real biological question being asked. At Sapient, we’ve built a diverse biocomputational team that contributes all of these different perspectives, elevating the quality of our omics data analysis capabilities and therefore, the insights we can deliver.

In terms of infrastructure, you have to consider the sheer volume of data that multi-omics can generate. Sometimes there are multiple terabytes of data to be analyzed in a single study. We leverage both on-premise and cloud-based infrastructure to process this data in parallel, enabling much faster computation.

And finally, you need tools that enable your team to extract maximum value out of these large-scale datasets. We use a mix of best-in-class software and algorithms alongside internally developed tools tailored to address specific challenges. For example, we’ve built a proprietary workflow to correct for batch effects, a major issue in multi-omics caused by variability over time, variability across instruments, or sample handling discrepancies. Our tools remove these unwanted variations while preserving the true biological signals.

Get the Resource

Read the Article