As the cornerstone of popular chatbots including GPT-4, large language models (LLMs) trained on vast amounts of text data have been contributing to advances in diverse fields including literature, art, and science, but their potential in the complex realms of biology and genomics has yet to be fully unlocked.
Credit: Insilico Medicine
- Based on Microsoft BioGPT, Insilico Medicine’s R&D team proposed a novel approach for predicting therapeutic targets using a large language model (LLM) specifically trained for biomedical tasks;
- A total of 9 potential dual-purpose targets against aging and 14 major age-related diseases were discovered, with CCR5 and PTH nominated as novel targets for anti-aging;
- Apart from target selection, the method can be applied to extensive ranking tasks, even without clear criteria.
As the cornerstone of popular chatbots including GPT-4, large language models (LLMs) trained on vast amounts of text data have been contributing to advances in diverse fields including literature, art, and science, but their potential in the complex realms of biology and genomics has yet to be fully unlocked.
Insilico Medicine, a clinical-stage generative artificial intelligence (AI)-driven drug discovery company announced that the company has utilized the connection retrieval ability of Microsoft BioGPT to identify 9 potential dual-purpose targets against both the aging process and 14 major age-related diseases. Two of the proposed genes have not been previously correlated to the aging process, indicating the potential of Transformer models in novel target prediction and other ranking tasks across the biomedical field. The findings were published in the journal Aging.
According to recent publications, the majority of LLMs are trained on the continuation of texts, and work by suggesting the next word possible depending on the connection and probability distribution extracted from the context. Given a plausible prompt and adequate background data, scientists can now apply LLMs, especially specialized models, to the target prioritization process.
BioGPT, the domain-specific generative Transformer language model, was jointly proposed by Microsoft Research and Peking University in China. Pre-trained on millions of previously published biomedical research articles, the model outperformed previous models in multiple biomedical natural language processing tasks and demonstrated human parity in analyzing biomedical research to answer questions.
To further enhance the performance of BioGPT, Insilico researchers used a dataset of 900,000 grant proposals from the National Institutes of Health for training, and evaluated the effect through log fold change of enrichment (ELFC) and hypergeometric p-value (HGPV) scores. Next, the team established a target discovery pipeline including the prompt, retrieval probability of tokens, and gene probability calculation.
Using the final prompt sentence of “human gene targeted by a drug for treating {DISEASE} is the,” and the general tokenizer from BioGPT, the researchers proposed 9 potential targets after several cycles of probability retrieval. In the end, 5 targets were nominated as dual-purpose targets against aging and all 14 age-related diseases including Alzheimer’s disease, amyotrophic lateral sclerosis, and idiopathic pulmonary fibrosis. Both CCR5 and PTH are considered novel age-related targets.
“I am thrilled to see this breakthrough based on LLMs presented by the Insilico team, as it highlights the potential of a Transformer and generative AI approach combined with specific databases,” says Alex Zhavoronkov, PhD, founder and CEO of Insilico Medicine. “We hope to further accelerate drug R&D processes using our proprietary Pharma.AI platform in this era of biotech paradigm change.”
“BioGPT can learn and understand large amounts of medical literature, thereby empowering practical processes including novel drug research and development, medical knowledge graph development, precision medicine, and medical dialogue assistance systems, and driving new biotechnology developments,” said Tao Qin, PhD, Senior Principal Researcher at Microsoft Research AI4Science. “The research results released by Insilico Medicine shed light on new practical application scenarios for BioGPT and other LLM-based AI engines. We look forward to further real-world applications and more breakthroughs.”
A leader in generative AI for drug discovery, Insilico Medicine has established and validated its proprietary end-to-end Pharma.AI platform across target discovery, small molecule generation, and clinical trial design. Recently, the company published the validation results of inClinico in Clinical Pharmacology and Therapeutics, where the Transformer-based clinical trial prediction tool achieved 79% accuracy in prospective validation.
About Insilico Medicine
Insilico Medicine, a clinical-stage end-to-end artificial intelligence (AI)-driven drug discovery company, connects biology, chemistry, and clinical trials analysis using next-generation AI systems. The company has developed AI platforms that utilize deep generative models, reinforcement learning, transformers, and other modern machine learning techniques to discover novel targets and to design novel molecular structures with desired properties. Insilico Medicine delivers breakthrough solutions to discover and develop innovative drugs for cancer, fibrosis, immunity, central nervous system (CNS), and aging-related diseases. For more information, visit www.insilico.com
Journal
Aging-US
DOI
10.18632/aging.205055
Article Title
Biomedical generative pre-trained based transformer language model for age-related disease target discovery
Article Publication Date
22-Sep-2023