MULTI-evolve: Accelerating the Evolution of Complex Multi-Mutant Proteins

In the vast and extraordinarily complex landscape of protein engineering, the pursuit to design proteins with enhanced functionalities has traditionally been an arduous process. The sheer combinatorial explosion that arises when attempting to explore the landscape of possible protein variants is staggering. For a protein comprising merely 100 amino acids, there exists a theoretical space of 20^100 permutations, a number vastly exceeding the count of atoms in the observable universe. This incomprehensible breadth of possibilities renders brute-force experimental approaches impractical, limiting early protein engineering efforts to the examination of only hundreds or thousands of variants—hardly scratching the surface of potential sequence space.

Machine learning has offered a promising avenue to transcend these limitations by enabling in silico screening of vast variant pools. However, even the most advanced computational methods until now have required extensive datasets, often entailing tens of thousands of protein measurements and multiple iterative rounds of laboratory verification. Despite these advances, the bottleneck remains in the practical ability to synthesize and experimentally characterize new proteins: laboratory constraints restrict researchers to testing only a few hundred protein variants per campaign, imposing severe limits on exploration and innovation.

Addressing this critical challenge, researchers at the Arc Institute have unveiled MULTI-evolve, a novel, AI-guided framework that radically compresses the timeframe and data requirements for directed protein evolution. MULTI-evolve integrates machine learning models trained on compact datasets—roughly 200 strategically selected protein variants—to predict combinations of function-enhancing mutations with unprecedented accuracy. This approach leverages key insights into the nature of epistatic interactions between mutations, enabling exploration of previously inaccessible regions of protein sequence space within weeks, rather than the months that traditional methods demand.

A central innovation of MULTI-evolve lies in its emphasis on pairwise interactions, or epistasis, between mutations. Unlike prior models trained solely on single-point mutations that fail to account for complex interplay between amino acid changes, MULTI-evolve begins by identifying about 15 to 20 beneficial single mutations via a combined assessment of protein language models and experimental data. It then systematically constructs and measures all pairwise combinations of these enhanced mutations, generating a dataset of approximately 100-200 variants rich in epistatic information. Training neural networks on these strategically curated datasets enables the model to accurately infer how multiple mutations synergize, antagonize, or additively combine, paving the way to predict higher-order multi-mutant variants with up to 7 mutations.

Comprehensive benchmarking across 12 diverse, previously published protein datasets validated this approach. Models trained only on single and double mutants successfully predicted the functionality of multi-mutant variants containing up to a dozen combined mutations. Remarkably, these predictions remained robust even when training data were reduced to just 10% of original datasets, underscoring the power of leveraging pairwise mutational insights over brute-force data accumulation.

The real-world applicability of MULTI-evolve was demonstrated across three novel protein engineering campaigns targeting diverse proteins with substantial improvements. The engineered variants included an APEX peroxidase variant exhibiting up to a 256-fold enhancement in activity compared to wild-type, surpassing even prior optimizations by roughly fivefold. Additionally, a deactivated CasRx (dCasRx) protein tailored for RNA trans-splicing was optimized for nearly a tenfold increase in function. Lastly, an anti-CD122 antibody underwent enhancements achieving a 2.7-fold better binding affinity and a 6.5-fold boost in expression levels. These projects all required only a single experimental round most efficiently testing ~100-200 variants, compressing traditionally bulky iterative workflows into weeks.

Key to the discovery of beneficial mutations within MULTI-evolve is an innovative approach that ensembles multiple protein language models. These algorithms, some based on sequence information, others on predicted three-dimensional structural data, generate mutation impact predictions that are subsequently aggregated through custom scoring schemes. This ensemble method notably outperforms any single model approach by identifying, on average, nearly double the number of function-enhancing mutations across 73 diverse protein datasets. For example, the identification of the A134P mutation in APEX, conferring a dramatic 53-fold activity increase, was only possible due to the team’s strategy to normalize amino acid substitution biases—specifically mitigating penalties against proline that standard methods impose.

After capturing these candidate mutations, MULTI-evolve employs fully connected neural networks optimized to predict how varying mutation combinations impact the overall protein function. Computational experiments demonstrated that these networks could identify top-performing multi-mutants more than half the time across the tested datasets. Experimentally, as few as nine proposed multi-mutant variants required testing to verify and validate model predictions with impressive accuracy, showcasing a paradigm where machine learning guides highly efficient experimental design.

Recognizing that physical synthesis of predicted variants often presents a hinderance—commercial DNA synthesis remains costly and time-intensive especially for complex multi-mutant constructs—MULTI-evolve introduces MULTI-assembly. This tailored multi-site mutagenesis methodology optimizes reaction parameters and oligonucleotide design through computational primer generation to achieve high assembly efficiencies. MULTI-assembly attains 40-70% efficiency in building constructs harboring up to 9 simultaneous mutations over multiple kilobases, enabling rapid and cost-effective experimental validation cycles that traditionally would require weeks.

The modular architecture of the MULTI-evolve framework means it is poised for continual improvement alongside advancing protein language models and experimental techniques. Its incorporation aligns synergistically with current computational protein design platforms, promising to enhance iterative refinement of engineered enzymes, genome editors, and therapeutic molecules. Importantly, the Arc Institute has made MULTI-evolve accessible as an open-source resource, encompassing mutation prediction, neural network training, and oligo design tools, democratizing access to state-of-the-art protein evolution methodologies across the scientific community.

MULTI-evolve’s fusion of AI and experimental biology represents a watershed moment in protein engineering: it advocates strategic data selection to extract maximal mechanistic insight, leveraging pairwise epistatic interactions to navigate the vast mutation space with surgical precision. By transforming months of cyclic laboratory work into matters of weeks, this framework accelerates the discovery of hyperactive protein variants essential for biotechnology, medicine, and synthetic biology. The Arc Institute’s innovation exemplifies how machine learning, when tightly integrated into experimental workflows, profoundly alters the future of biological design.

The research detailing this breakthrough, led by Vincent Q. Tran and colleagues, was published in Science on February 19, 2026. As protein engineering continues to advance at the intersection of computational prediction and empirical validation, tools such as MULTI-evolve are set to redefine the pace and scope of directed evolution across disciplines.

Subject of Research: Cells

Article Title: Rapid directed evolution guided by protein language models and epistatic interactions

News Publication Date: 19-Feb-2026

Web References: https://dx.doi.org/10.1126/science.aea1820

References:
Tran, V.Q., Nemeth, M., Bartie, L.J., Chandrasekaran, S.S., Fanton, A., Moon, H.C., Hie, B.L., Konermann, S., & Hsu, P.D. (2026). Rapid directed evolution guided by protein language models and epistatic interactions. Science.

Image Credits: Arc Institute

Keywords

Protein engineering, Artificial intelligence

MULTI-evolve: Accelerating the Evolution of Complex Multi-Mutant Proteins

Related Posts

Innovative AI Steering Technique Reveals System Vulnerabilities and Paths for Enhancement

Neural Network Models Human Concept Formation and Communication

Idaho National Laboratory Advances Nuclear Energy Deployment with NVIDIA AI in the Genesis Mission

Astrocytes Boost Amygdala Memory Neural Maps

POPULAR NEWS

Imagine a Social Media Feed That Challenges Your Views Instead of Reinforcing Them

Digital Privacy: Health Data Control in Incarceration

New Record Great White Shark Discovery in Spain Prompts 160-Year Scientific Review

Epigenetic Changes Play a Crucial Role in Accelerating the Spread of Pancreatic Cancer

About

Follow us

Recent News

Mayo Clinic Establishes Patient Information Office in the Cayman Islands

Innovative AI Steering Technique Reveals System Vulnerabilities and Paths for Enhancement

Nationwide Survey Reveals Dementia Care in Memory Clinics

Subscribe to Blog via Email

Welcome Back!

Retrieve your password