In a remarkable leap for computational biology, the Evo 2 DNA foundation model has been unveiled in the prestigious journal Nature, marking a groundbreaking advancement in the way artificial intelligence interacts with the building blocks of life. This cutting-edge model, trained on over 100,000 species’ genomes across the entire tree of life, pushes the boundaries of biological understanding and genomic design. Its ability to decode the genetic language—shared universally from woolly mammoths to bacteria—represents an unprecedented stride in both genomics and AI-driven biology.
Evo 2 emerges from collaborative efforts by the Arc Institute, NVIDIA, and leading academic institutions such as Stanford University, UC Berkeley, and UC San Francisco. The model’s architecture enables it to scan and interpret massive data sets, revealing intricate patterns in DNA sequences that eluded researchers for decades. Unlike traditional methods that may require years or even decades to unravel comparable genetic insights, Evo 2 can perform these tasks in a fraction of the time with extraordinary accuracy.
A deep dive into the technical specifications reveals that Evo 2 was trained on over 9.3 trillion nucleotides sourced from more than 128,000 whole genomes as well as metagenomic data, encompassing bacteria, archaea, phages, plants, humans, and a wide range of other eukaryotic organisms. This expansive training set allows Evo 2 to develop a profound understanding across life’s domains, solidifying its standing as the largest, most sophisticated AI model in biology to date. Its predecessors focused on more limited genetic windows, particularly single-cell genomes, while Evo 2 simultaneously processes genetic sequences up to one million nucleotides long. This capability is critical for analyzing long-range relationships in genomes, which play pivotal roles in gene expression and organismal complexity.
The foundation of Evo 2’s advanced performance is the novel AI architecture known as StripedHyena 2, a significant evolution from prior models. By leveraging the computational power of over 2,000 NVIDIA H100 GPUs on the DGX Cloud AI platform through AWS, the Evo 2 training required reimagining AI data ingestion and inference strategies for genomics. The model’s ability to handle 30 times more data and reason over eight times as many nucleotides compared to its predecessor empowers it to detect subtle, context-dependent variations within DNA sequences, a task that has historically challenged both machine and human experts.
Beyond raw computational prowess, Evo 2 shines in its practical applications. For example, it has demonstrated exceptional proficiency in pinpointing disease-causing mutations in critical human genes such as BRCA1, achieving over 90% accuracy in distinguishing benign from potentially pathogenic variants. This degree of precision can vastly accelerate genetic diagnostics and personalized medicine, reducing both time and financial costs linked to experimental validation in wet-lab environments.
The researchers behind Evo 2 envision the model as a versatile platform, capable of fostering innovations across scientific domains. The model’s publicly accessible code and integration into NVIDIA’s BioNeMo framework encourage the global research community to expand and fine-tune its capabilities. Arc Institute also partnered with the AI research lab Goodfire to develop sophisticated mechanistic interpretability tools that unlock the biological features and genomic motifs Evo 2 learns. This transparency bridges the gap between black-box AI and biological insight, enhancing trust and fostering deeper explorations of genetic code.
One of the potentially transformative applications of Evo 2 lies in synthetic biology and genome engineering. By interpreting genomic languages and evolutionary imprints, the model can guide the design of novel genomes that are as extensive as those found in relatively simple organisms like bacteria. Arc Institute’s work has already demonstrated functional synthetic bacteriophages designed using Evo 2, which could revolutionize approaches to combating antibiotic-resistant infections, a pressing global health crisis.
Evo 2’s prowess extends to biological specificity. A promising example cited by the developers includes engineering genes to express uniquely in specific cell types, such as neurons or liver cells. This precision genome editing could mitigate side effects in gene therapy by limiting gene expression to targeted tissues, an advancement with profound implications for developing safer, more effective medical interventions.
The AI model captures an essential truth about biology: just as evolution has left its fingerprints on nucleotides through millennia, these genetic patterns hold keys to molecular interactions and functional dynamics. By assimilating this evolutionary information, Evo 2 has acquired a form of generalized genomic literacy. Its capacity to “think” in the language of nucleotides represents an extraordinary step for generative biology, empowering machines to actively read, write, and even innovate within genetic sequences.
Importantly, the research team has exercised ethical caution in developing Evo 2. The training data explicitly omits pathogens harmful to humans and other complex organisms. Mechanisms have been put in place to prevent the model from generating productive outputs related to these excluded pathogens. Stanford’s Professor Tina Hernandez-Boussard and her team have been instrumental in guiding the responsible development and deployment of Evo 2 to ensure safe, ethical use of this transformative technology.
The scale and ambition of Evo 2 have not only broadened scientific horizons but also set new standards in the realm of biological AI. By creating a foundation model trained on one of the most extensive and diverse biological datasets yet assembled, the Arc Institute and NVIDIA have provided researchers worldwide a powerful tool akin to a universal operating system for genetic research. As co-author Dave Burke suggests, this model represents a “kernel” upon which countless future scientific applications can be built, from genome annotation to therapeutic design.
The widespread enthusiasm surrounding Evo 2 is underscored by its rapid adoption and application across diverse biological challenges—from predicting Alzheimer’s genetic risk to assessing variant effects in domestic animals. Arc Institute’s open approach, which shares training data, model weights, and code, invites the global research community to innovate, accelerating discovery while maintaining transparency—a critical factor in AI’s intersection with biology.
As Evo 2 continues to evolve and integrate with other technological advancements, its impact on medicine, synthetic biology, conservation, and fundamental biological research is poised to expand dramatically. With the power to decode life’s language at scale, this AI model not only deepens human understanding of genetic architecture but also opens avenues for designing life forms and therapies hitherto imagined only in science fiction.
Subject of Research: Not applicable
Article Title: Genome modeling and design across all domains of life with Evo 2
News Publication Date: 4-Mar-2026
Web References:
Arc Institute GitHub: https://github.com/arcinstitute/evo2
NVIDIA BioNeMo Framework: https://github.com/NVIDIA/bionemo-framework
Goodfire mechanistic interpretability tool: https://arcinstitute.org/tools/evo/evo-mech-interp
References:
Brixi, G., Durrant, M.G., Ku, J., et al. (2026). Genome modeling and design across all domains of life with Evo 2. Nature. DOI: 10.1038/s41586-026-10176-5
Image Credits: Arc Institute
Keywords: Artificial intelligence, Life sciences, DNA, Genomics, Genome modeling, Synthetic biology, Evolutionary biology, Machine learning
Tags: AI in computational biologyAI trained on 100000 species genomesAI-driven biological pattern recognitioncollaborative AI biology researchcross-domain genetic code modelingdeep learning for DNA sequencesEvo 2 DNA foundation modelfast genomic decoding with AIgenomic design with artificial intelligencelarge-scale genome analysis AImetagenomic data interpretation AINVIDIA AI genomics innovation



