Machine learning-aided scoring of synthesis difficulties for designer

Artificially synthesizing genomes has broad prospects in fields such as medical research and industrial strains. From the synthesis of the artificial life JCVI-syn1.0 by Craig Venter’s team in 2010, to the rewriting and synthesis of the prokaryotic E. coli genome, and to the Sc2.0 project’s artificial synthesis of the yeast genome, researchers are constantly advancing in the depth and breadth of genome design and synthesis. However, there are still difficulties in synthesizing certain gene segments, ultimately leading to the inability to complete artificial chromosomes, which limits the application and promotion of artificial genome synthesis technology. To address this issue, the team of Professor Yingjin Yuan from Tianjin University has developed an interpretable machine learning framework (Figure 1) that can predict and quantify the difficulty of chromosome synthesis, providing guidance for optimizing chromosome design and synthesis processes.

Credit: ©Science China Press

The research team designed an efficient feature selection method by analyzing data of a large number of known chromosome fragments, and identified six key sequence features that cover energy and structural information during DNA chemical synthesis and assembly. Based on these results, the team developed an eXtreme Gradient Boosting (XGBoost) model that can effectively predict the synthesis difficulties of chromosome fragments. The model achieved an AUC (area under the receiver operating characteristic curves) of 0.895 in cross-validation and an AUC of 0.885 on an independent test set in collaboration with a DNA synthesis company, demonstrating a high accuracy and predictive ability.

The research team proposed a Synthesis difficulty Index (S-index) based on the SHAP algorithm to evaluate and interpret the synthesis difficulties of chromosomes. The study found that there were significant differences in the synthesis difficulties of different chromosomes, and the S-index could quantitatively explain the causes of synthesis difficulties for some gene fragments (Figure 2), providing a basis for chromosome sequence design and synthesis and improving the efficiency and success rate of designer chromosome synthesis. This achievement provides a practical tool for researchers in chromosome engineering and genome rewriting, and is expected to provide more comprehensive guidance and support for chromosome design and synthesis.

###

See the article:

Machine learning unravels the complexity of varied difficulties in designer chromosome synthesis

https://doi.org/10.1007/s11427-023-2306-x

Journal

Science China Life Sciences

DOI

10.1007/s11427-023-2306-x

Machine learning-aided scoring of synthesis difficulties for designer chromosomes

Related Posts

Florida Cane Toad: Complex Spread and Selective Evolution

New Study Uncovers Mechanism Behind Burn Pit Particulate Matter–Induced Lung Inflammation

DeepBlastoid: Advancing Automated and Efficient Evaluation of Human Blastoids with Deep Learning

Navigating the Gut: The Role of Formic Acid in the Microbiome

POPULAR NEWS

Robotic Ureteral Reconstruction: A Novel Approach

Digital Privacy: Health Data Control in Incarceration

Breakthrough in RNA Research Accelerates Medical Innovations Timeline

Mapping Tertiary Lymphoid Structures for Kidney Cancer Biomarkers

About

Follow us

Recent News

Group Therapy Boosts Recovery in Elderly Depression

Evaluating Biosimilar Trastuzumab for Breast Cancer in Thailand

Decoding Phantom Limb Movements via Intraneural Signals

Subscribe to Blog via Email

Welcome Back!

Retrieve your password