In the wake of the COVID-19 pandemic, the scientific community encountered an unprecedented challenge: how to efficiently and accurately construct and evaluate phylogenetic trees derived from millions of viral genomes. These evolutionary family trees are essential tools for understanding the origins, mutations, and spread of pathogens, offering critical insights into when new strains emerge and how they relate to one another. Traditionally, researchers have relied on long-standing methods to gauge the reliability of these trees, yet the sheer volume and complexity of data generated during the pandemic rendered such techniques insufficient. Addressing this gap, a team of researchers from EMBL’s European Bioinformatics Institute (EMBL-EBI), in collaboration with the Australian National University, has developed SPRTA — a breakthrough approach that redefines the assessment of phylogenetic confidence at pandemic scales.
For decades, the benchmark for evaluating the robustness of phylogenetic trees has been Felsenstein’s bootstrap, a statistical methodology established nearly 40 years ago. This method functions by resampling data to test tree stability, requiring hundreds to thousands of repetitions to provide confidence metrics. However, while effective for smaller datasets, its computational demands scale exponentially, making it impractical amidst the flood of genomic sequences encountered during the COVID-19 outbreak. This bottleneck significantly hampered real-time evolutionary analyses and, consequently, rapid public health responses.
SPRTA, short for SPR-based Tree Assessment, revolutionizes this process by serving as the first scalable, interpretable system designed specifically for pandemic-sized datasets. By leveraging subtree pruning and regrafting (SPR) operations, this method systematically explores the neighborhood of a given phylogenetic tree to assess the reliability of each branch. Instead of relying on time-consuming resampling, SPRTA evaluates plausible evolutionary scenarios by virtually rearranging tree branches and quantifying alternative hypotheses. This allows for rapid and nuanced confidence scoring, pinpointing which parts of expansive phylogenies are well-supported and which require cautious interpretation.
Unlike traditional bootstrap approaches that predominantly confirm whether particular clades appear consistently across datasets, SPRTA goes deeper by focusing on ancestor-descendant relationships. This perspective aligns more closely with the actual biological processes underpinning viral evolution during outbreaks. By calculating probabilistic scores for different evolutionary paths, SPRTA identifies not only high-confidence branches but also credible alternative trees that may explain ambiguous segments of the virus’s lineage. This capacity is vital for tracking mutation trajectories and understanding transmission dynamics with precision.
One of the distinguishing features of SPRTA is its integration with existing phylogenetic tools that handle large-scale data. It is embedded in MAPLE, an innovative software developed by EMBL-EBI capable of efficiently constructing massive phylogenetic trees from millions of genomes. Additionally, SPRTA is available in IQ-TREE, a widely adopted phylogenetic analysis package favored by the biological research community. These integrations ensure that SPRTA is accessible, user-friendly, and immediately applicable in diverse evolutionary studies, particularly those centered on pathogen surveillance and outbreak response.
The robustness and utility of SPRTA were demonstrated through its application to a dataset of over two million SARS-CoV-2 genomes, a scale that dwarfs most previous evolutionary analyses. This study showcased its ability to delineate branches with high confidence, flag uncertain placements often attributable to incomplete or noisy sequencing data, and reveal credible alternative evolutionary origins. Such insights allow public health experts and researchers to discern between reliable phylogenetic inferences and those that warrant further scrutiny, thereby enhancing the accuracy of outbreak reconstructions.
SPRTA’s interpretability is another core advantage. By providing straightforward probability scores indicating confidence levels in different tree branches, it empowers researchers to make informed decisions regarding evolutionary hypotheses. Instead of arbitrarily dismissing uncertain branches, scientists can now systematically explore alternative layouts suggested by the data. This level of transparency is crucial for genomic epidemiology, where misinterpretations can lead to flawed policies or misguided containment strategies.
Moreover, SPRTA addresses the pressing need for pandemic preparedness in a changing global health landscape. The COVID-19 crisis revealed how rapidly viruses can disseminate and evolve, stressing the necessity of real-time analysis tools that scale effectively. SPRTA’s innovative design accommodates such demands by drastically reducing computational time while enhancing analytical depth. This positions it as an indispensable resource for future outbreaks, enabling faster responses that could save lives and mitigate societal disruptions.
Dr. Nick Goldman, Group Leader at EMBL-EBI, emphasized SPRTA’s transformative impact by highlighting how the pandemic challenged existing computational frameworks. He noted that the tool delivers both speed and reliability, making it easier for researchers to trust their evolutionary conclusions and swiftly adapt to emerging pathogens. In parallel, Senior Scientist Nicola De Maio underscored the method’s ability to detect which relationships in massive trees are solid and which are tentative, thereby refining the accuracy of genomic surveillance.
The availability of SPRTA as open-source software also fosters collaborative advancements across the global scientific community. By incorporating it into accessible platforms, the developers promote transparent, reproducible, and equitable research practices. As researchers worldwide face ever-increasing volumes of genomic data, tools like SPRTA set new standards for analytical rigor, operational feasibility, and biological insight.
In conclusion, SPRTA represents a landmark advancement in phylogenetic analysis, tailored to the realities of pandemic-scale data. Through ingenious algorithmic innovations and seamless integration with existing tools, it presents a smarter, faster, and more interpretable way to measure confidence in evolutionary trees. By enabling precise tracking of pathogen spread and evolution under tremendous data loads, SPRTA enhances preparedness and responsiveness for both ongoing and future public health crises. This work not only stands as a testament to computational and biological ingenuity but also offers a beacon for scientists striving to understand and control infectious diseases in an interconnected world.
Subject of Research: Phylogenetic confidence assessment in pandemic-scale viral genome datasets
Article Title: Assessing phylogenetic confidence at pandemic scales
News Publication Date: 5-Nov-2025
Web References:
DOI link to Nature article
EMBL-EBI MAPLE tool
IQ-TREE software
Image Credits: Karen Arnott/EMBL-EBI
Keywords: Disease outbreaks, SARS CoV 2, COVID 19, Phylogenetic analysis, Phylogenetic trees, Virology, Viral infections, Evolutionary biology
Tags: computational biology advancementsCOVID-19 genomic analysisEMBL European Bioinformatics Instituteevolutionary uncertainty quantificationFelsenstein’s bootstrap limitationsgenomic epidemiology insightspandemic-scale data processingphylogenetic confidence assessmentphylogenetic tree constructionresampling techniques in phylogeneticsSPRTA methodology developmentviral genome phylogenetics



