The quest to understand proteins at an atomic level has long been a cornerstone of biomedical science. Proteins, composed of sequences of amino acids, fold into specific three-dimensional shapes that dictate their function in living organisms. This structural knowledge is crucial not only for basic biological insight but also for the rational design of new therapies—many of which target proteins or use proteins themselves, such as enzymes and antibodies. Despite the importance, experimentally determining protein structures remained a painstaking and resource-intensive challenge until the advent of artificial intelligence (AI)-based predictive models.
In recent years, AI techniques have revolutionized structural biology by accurately predicting protein folds from amino acid sequences. Programs like AlphaFold and RosettaFold have transformed our capacity to visualize proteins in silico. These models leverage deep learning architectures trained on known protein structures to infer the spatial arrangements of new protein sequences. The pathbreaking achievements of these methods were heralded by the 2024 Nobel Prize in Chemistry, underscoring their scientific and medical significance.
Yet, despite this remarkable progress, important questions about the underlying mechanisms by which these AI models operate remain unanswered. The latest iterations of these algorithms extend their capabilities beyond predicting isolated protein structures. They now also model how proteins interact with other molecules—commonly referred to as ligands—such as candidate drug compounds. This co-folding or docking prediction holds immense potential for drug discovery, providing a computational shortcut to designing molecules that fit precisely into protein binding sites.
Professor Markus Lill and his team at the University of Basel’s Department of Pharmaceutical Sciences have recently investigated these promising developments with a critical eye. Their research focuses on designing active pharmaceutical ingredients, and naturally, they wondered if current AI models genuinely comprehend the physical chemistry underlying protein-ligand interactions. Given the relatively small dataset of approximately 100,000 protein-ligand structures available for training, they suspected these AI systems might be leveraging superficial pattern recognition rather than fundamental scientific principles.
In their study, Lill’s group introduced deliberate modifications to hundreds of protein sequences to disrupt or alter the chemistry of known ligand binding sites. These included changing the charge distribution dramatically or completely occluding binding pockets. Surprisingly, despite these profound alterations that would normally abrogate ligand binding in reality, the AI models continued predicting the original protein-ligand complex structures, almost as if the modifications had never occurred.
A similar approach was taken with the ligands themselves. By altering the chemical structures of the ligands to prevent any possible interaction with their target proteins, the researchers found that the AI predictions remained largely unchanged. In over half the cases examined, the models failed to account for these perturbations, predicting stable complexes that physically and chemically should not exist.
This compelling evidence suggests that the AI co-folding models do not yet internalize the physicochemical laws that govern molecular recognition and binding affinity. Instead, they appear to rely heavily on data memorization—pattern matching gleaned from their training sets—without a genuine mechanistic understanding. The models can generate plausible-looking structures but lack the capacity to predict outcomes when confronted with novel or deliberately modified molecules and protein sites.
Compounding this issue is the fact that these AI systems struggle considerably when dealing with proteins unlike any they were trained on. When encountering entirely new folds or ligands with no close analogs in the training data, their predictive accuracy drastically decreases. This limitation is particularly consequential given that novel drug targets often involve previously uncharacterized proteins. The inability of these models to generalize beyond known data restricts their utility for cutting-edge drug discovery.
Professor Lill emphasizes a note of caution for the pharmaceutical community. While AI-derived structural models hold great promise for accelerating drug development, relying solely on these predictions without experimental validation or supplementary computational techniques that incorporate physical chemistry can lead to misleading conclusions. Empirical validation remains indispensable to verify AI-based hypotheses and refine candidate drug molecules accordingly.
Looking forward, the researchers propose an exciting direction: integrating the fundamental principles of physics and chemistry directly into AI frameworks. By embedding these constraints and mechanistic insights into machine learning architectures, future models could generate predictions grounded in molecular reality rather than solely on statistical correlation. Such hybrid approaches may yield more accurate and reliable structures, even for uncharted protein-ligand systems.
This integration could profoundly impact drug discovery pipelines by enabling the targeted design of molecules for proteins currently deemed “undruggable” due to their complex or elusive structures. Moreover, enhanced model reliability could shorten development timelines, reduce costly experimental iterations, and catalyze novel therapeutic strategies. The union of AI sophistication with physical law could mark the next transformative leap in biomedical research.
The current study, published in Nature Communications, serves as a vital wake-up call highlighting both the dazzling potential and existing shortcomings of AI in structural biology and pharmacology. It underscores the imperative for multidisciplinary approaches that combine machine learning prowess with rigorous physicochemical understanding. This synergy will be essential to unlock the full promise of AI-guided drug design and realize truly transformative healthcare innovations.
As the field races forward, nuanced scrutiny such as that by Professor Lill and colleagues will ensure that AI tools evolve not only in accuracy but in conceptual depth. Such progress will empower researchers to wield AI models as dependable scientific instruments rather than black boxes, ultimately expediting the discovery of life-saving medicines.
Subject of Research:
Physics-based evaluation of deep learning models predicting protein-ligand co-folding structures
Article Title:
Investigating whether deep learning models for co-folding learn the physics of protein-ligand interactions
News Publication Date:
6-Oct-2025
Web References:
https://doi.org/10.1038/s41467-025-63947-5
Keywords
Protein Structure Prediction, AI in Drug Discovery, AlphaFold, RosettaFold, Protein-Ligand Interactions, Deep Learning, Structural Biology, Computational Chemistry, Molecular Docking, Physicochemical Properties, Machine Learning Limitations, Pharmaceutical Sciences
Tags: AI in drug designAlphaFold advancementsartificial intelligence in structural biologybiomedical research breakthroughschallenges in AI modelsdeep learning in biologyfuture of protein modelingmechanisms of AI algorithmsprotein structure predictionrational drug design techniquesRosettaFold applicationssignificance of protein folding



