In the ongoing quest to discover groundbreaking therapeutics for complex diseases such as cancer and infectious agents, researchers are increasingly turning to nature’s own molecular architectures for inspiration. Among these molecular marvels are lasso peptides, a class of bacterial natural products characterized by their distinctive knot-like conformations. These peptides possess exceptional stability and a wide spectrum of biological activities, making them highly attractive scaffolds for drug discovery. To harness their full clinical potential, a team from the Carl R. Woese Institute for Genomic Biology has developed LassoESM, an advanced large language model specifically designed to predict the properties of lasso peptides with unprecedented accuracy.
Lasso peptides are synthesized by bacteria through a fascinating biosynthetic pathway wherein ribosomes assemble linear chains of amino acids, which are subsequently folded into a slipknot-like structure by specialized biosynthetic enzymes. This unique topology bestows lasso peptides with extraordinary stability against enzymatic degradation and environmental stressors. Thousands of different lasso peptides have been identified across microbial species, many exhibiting potent antibacterial, antiviral, and anticancer properties that underscore their therapeutic promise.
Professor Doug Mitchell, co-leader of the study and Director of the Vanderbilt Institute for Chemical Biology, emphasized the untapped potential of these molecules: “The unique structural features of lasso peptides make them ideal candidates for targeting challenging receptors and developing robust oral therapeutics. By creating a dedicated language model tailored for these peptides, we now have a powerful computational tool to accelerate discovery and design in this emerging field.”
While machine learning has become integral in analyzing vast biological datasets, existing AI models such as AlphaFold, despite their revolutionary impact on protein structure prediction, face intrinsic limitations when applied to lasso peptides. The atypical lasso fold deviates significantly from canonical protein structures, rendering traditional prediction algorithms ineffective in accurately modeling their complex topology. This gap motivated the development of LassoESM, a bespoke protein language model specifically trained on the sequences and structural intricacies of lasso peptides.
Unlike generic protein language models that learn from a broad range of protein sequences, LassoESM was meticulously trained on a curated dataset of thousands of experimentally validated lasso peptides. The model uses a masked language modeling technique, wherein fragments of peptide sequences are concealed and predicted, enabling the model to learn the underlying “language” of lasso peptide biosynthesis and folding patterns. This deep understanding allows LassoESM to capture subtle sequence-structure relationships unique to the lasso fold, which conventional models overlook.
A core functionality of LassoESM lies in its ability to predict interactions between lasso peptides and their biosynthetic enzymes, particularly lasso cyclases—the specialized enzymes responsible for catalyzing the knot-tying step of peptide biosynthesis. Since each lasso cyclase recognizes specific peptide substrates much like keys fitting into distinct locks, deciphering these interactions is crucial for engineering novel peptides with designed functionalities. The LassoESM model can infer which lasso cyclase pairs are compatible with a given peptide sequence, a feat that was previously challenging due to sparse experimental data and complex enzyme-substrate specificity.
The collaborative effort harnessed the complementary expertise of the Mitchell and Shukla laboratories, combining bioinformatics, machine learning, and experimental validation. They initially employed bioinformatics approaches to collect a comprehensive catalog of lasso peptides from diverse microorganisms, followed by manual validation to ensure the accuracy of sequence annotations. This high-quality dataset was essential for reliably training the language model. Subsequently, the model was fine-tuned for multiple predictive tasks, including lasso peptide enzymatic compatibility, structural property inference, and functional annotation.
Dr. Diwakar Shukla, co-leader and chemical engineering professor at the University of Illinois Urbana-Champaign, highlighted the transformative impact of this approach: “By decoding the molecular ‘language’ of lasso peptides, LassoESM opens new horizons in predicting properties and functions that have remained elusive. This tool enables us to not only predict structure but also to rationally design peptides with tailored features for specific biomedical applications.”
Despite the limited availability of labeled experimental data—a common bottleneck in peptide research—LassoESM demonstrated robust performance in predicting diverse lasso peptide properties. This capability significantly reduces the empirical trial-and-error burden traditionally associated with discovering and optimizing peptide therapeutics. The model thereby streamlines the development pipeline from sequence to functional candidate, accelerating translational applications in industry and medicine.
Looking ahead, the researchers aspire to extend this AI-driven framework to other classes of peptide natural products beyond lassos. They envision developing specialized language models capable of capturing the nuances of various peptide topologies and their biosynthetic strategies. Additionally, the team aims to leverage LassoESM in engineering peptides that selectively target specific proteins, potentially creating a new generation of peptide-based therapeutics with enhanced efficacy and stability.
The development and application of LassoESM exemplify the power of interdisciplinary collaboration and cutting-edge computational resources. Supported by the National Institutes of Health and facilitated by the robust infrastructure at the Carl R. Woese Institute for Genomic Biology, this research represents a significant advance in peptide engineering. As machine learning continues to evolve, tailored models such as LassoESM are poised to revolutionize how scientists understand, design, and deploy complex biomolecules in real-world therapies.
In summary, LassoESM is an innovative language model that captures the structural and functional essence of lasso peptides, overcoming longstanding challenges in prediction and design. By enabling precise forecasting of peptide properties and enzyme compatibility, it paves the way for rational, AI-driven development of novel therapeutics. This work stands as a compelling testament to the synergy between computational biology and experimental science in transforming drug discovery.
Subject of Research: Lasso peptides and protein language models for biomedical and industrial applications
Article Title: LassoESM a tailored language model for enhanced lasso peptide property prediction
News Publication Date: 29-Sep-2025
Web References:
https://doi.org/10.1038/s41467-025-63412-3
Image Credits: Xuenan Mi, Isaac Mitchell
Keywords: Protein engineering, Machine learning, Artificial intelligence, Peptides, Drug discovery, Bioinformatics
Tags: advanced language models in biologyantibacterial and antiviral peptidesbiosynthetic pathways of lasso peptidescancer therapeutics from lasso peptideslasso peptidesmachine learning in peptide researchmicrobial natural products researchnatural products in medicinepeptide engineering for drug discoverypredicting peptide properties with AIstability of lasso peptide structurestherapeutic applications of lasso peptides