• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Wednesday, May 20, 2026
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Biology

Shandong University Researchers Innovate Multi-Scale Feature Fusion and Weighted Ensemble Learning for Precise Promoter Identification Across Cell Lines

Bioengineer by Bioengineer
May 20, 2026
in Biology
Reading Time: 4 mins read
0
Shandong University Researchers Innovate Multi-Scale Feature Fusion and Weighted Ensemble Learning for Precise Promoter Identification Across Cell Lines — Biology
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

Promoters are essential elements within the genome that orchestrate the initiation of gene expression by attracting transcriptional machinery to specific DNA sequences. These regulatory regions act as gatekeepers, dictating whether a gene is switched on or off in a particular cell type. This cell-type specificity is a major challenge for computational biologists attempting to accurately identify promoters because a sequence that functions as a promoter in one cellular environment may be inactive in another. The complexity intensifies due to the vast heterogeneity and sequence diversity of promoter regions across the genome. Traditional computational models, typically trained on data from a limited number of cell lines, often struggle to generalize to novel cellular contexts, resulting in reduced accuracy and robustness.

In response to these limitations, a dedicated research team at the Shenzhen Research Institute and the Schools of Mathematics and Software at Shandong University has developed an innovative deep learning-based framework known as MuSE-Promoter. This model tackles the challenge of promoter identification across multiple cell types by integrating diverse computational features and sophisticated neural network architectures. Unlike previous methods that rely heavily on a single type of input feature, MuSE-Promoter leverages a multimodal approach that incorporates semantic embeddings and handcrafted biophysical descriptors to capture various facets of promoter sequences.

The core strength of MuSE-Promoter lies in its ability to process raw DNA sequences via parallel computational branches. One branch extracts semantic embeddings using natural language processing techniques adapted for genomics, specifically DNABERT and Word2Vec algorithms. These embeddings capture the underlying “grammar” of regulatory DNA in a manner analogous to language models interpreting text. The other branch extracts handcrafted biophysical features, including tri-nucleotide physicochemical properties and reverse-complement k-mer frequencies. These features add complementary information regarding the structural and physicochemical attributes of DNA, which are pivotal for transcription factor binding and promoter functionality.

The combined features feed into a multi-scale convolutional neural network enhanced with squeeze-excitation attention mechanisms. This architecture is designed to detect sequence motifs of varying lengths efficiently, recognizing intricate patterns that may be critical for promoter activity. Following convolutional feature extraction, a transformer encoder models long-range interactions within the promoter sequence, accounting for dependencies that span tens or hundreds of base pairs. This step is crucial because promoter function often depends on complex interactions across extended regions rather than localized motifs alone.

MuSE-Promoter further integrates the outputs from this deep learning backbone with predictions from a random forest classifier through a learnable weighted ensemble. This ensemble technique balances the strengths of neural networks and traditional machine learning methods, enhancing the overall robustness of predictions. Such a strategy mitigates overfitting, a common pitfall when models trained on one cell line are applied to others, facilitating more reliable cross-cell-line promoter identification.

The researchers rigorously evaluated MuSE-Promoter on data from four human cell lines—GM12878, HeLa-S3, HUVEC, and K562—as well as on promoter datasets from the plant Arabidopsis thaliana encompassing both TATA-box and non-TATA promoters. The comparative analyses demonstrated that MuSE-Promoter consistently outperforms state-of-the-art promoter prediction tools such as iPro-WAEL and Z-curve. Its superiority is especially pronounced in challenging scenarios involving cross-cell-line generalization and differentiation between promoters and enhancers, which are often confounded due to overlapping regulatory characteristics.

In cross-cell-line validation tests, MuSE-Promoter achieved an impressive average Area Under the Curve (AUC) of 0.991 and Matthews Correlation Coefficient (MCC) values above 0.92. These metrics reflect an exceptional ability to generalize promoter identification beyond the training cell line, a notable advancement over prior methodologies. The model’s learned sequence representations also revealed clear separability between promoters and non-promoters in high-dimensional feature space, and they assigned significant importance weights to biologically established motifs such as CGA, RCKmer, and CC. The capacity to highlight these motifs underscores the model’s interpretability and alignment with known molecular biology.

Professor Hao Wu, co-corresponding author of the study, emphasizes that the strength of MuSE-Promoter derives from combining semantic DNA sequence representations with explicit biophysical insights. “This multi-modal fusion empowers the model to capture the nuanced regulatory language of DNA as well as its structural context, which are both critical for transcription factor recruitment and promoter function,” he notes. Such an integrated approach outperforms models that are limited to either sequence patterns or physicochemical properties alone.

Complementing these insights, Professor Zhangyu Mei highlights the model’s translational potential to advance genome annotation efforts. “MuSE-Promoter is poised to become an indispensable tool for large-scale promoter annotation projects. It enables researchers to decode cell-type-specific regulatory programs more accurately and to distinguish bona fide promoters from other regulatory elements such as enhancers,” Mei explains. This capability is a vital step towards building comprehensive maps of gene regulation that reflect cellular specificity and complexity.

Looking forward, the team aims to extend the MuSE-Promoter framework by integrating multi-omics data layers, including epigenomic marks and chromatin accessibility profiles, to refine promoter identification further. Additionally, the researchers plan to adapt the model to predict enhancer-promoter interactions, shedding light on higher-order gene regulatory networks involved in cellular differentiation and disease. These expansions will harness the power of deep learning to unravel even more intricate regulatory mechanisms underpinning genome function.

All code and datasets underpinning MuSE-Promoter have been made openly accessible via their GitHub repository, promoting transparency and enabling broader adoption by the genomics research community. This openness fosters collaborative developments and benchmarking against emerging tools in promoter prediction.

The implications of MuSE-Promoter resonate beyond bioinformatics, offering potential applications in synthetic biology, precision medicine, and developmental biology by facilitating targeted manipulation of gene expression. By accurately identifying promoters active across diverse cell types, scientists can design gene circuits with precise regulatory controls or uncover dysregulated promoters linked to disease states.

This breakthrough represents a crucial stride toward overcoming the enduring challenge of promoter identification amidst the complexity of cell-type specificity. By integrating advanced machine learning architectures with a rich tapestry of genomic features, MuSE-Promoter sets a new standard in computational genomics and promises to accelerate discoveries in gene regulation.

Subject of Research: Cells

Article Title: MuSE-Promoter: a multi-scale feature fusion and weighted ensemble learning method for identifying promoters across multiple cell lines

Web References: https://github.com/HaoWuLab-Bioinformatics/MuSE-Promoter

References: DOI 10.1016/j.mdmed.2026.100002

Image Credits: Xiao Bi, Zhangyu Mei & Hao Wu

Keywords: Bioinformatics, Genetics, Molecular biology, Mathematics, Technology, Biochemical engineering, Artificial intelligence

Tags: biophysical descriptors in promoter modelingcell-type specific promoter predictioncomputational biology in gene expressioncross-cell line genomic predictiondeep learning models for gene regulationmulti-scale feature fusion for promoter identificationmultimodal neural networks for DNA analysispromoter sequence diversity challengesrobust promoter detection algorithmssemantic embeddings in bioinformaticsShandong University computational genomics researchweighted ensemble learning in genomics

Share12Tweet8Share2ShareShareShare2

Related Posts

RNA Editing Enzyme Transforms Aggressive Bone Cancer Cells — Biology

RNA Editing Enzyme Transforms Aggressive Bone Cancer Cells

May 20, 2026
Advancing Public Health and Longevity Medicine to Prolong Healthspan — Biology

Advancing Public Health and Longevity Medicine to Prolong Healthspan

May 20, 2026

Unraveling the Multifaceted Role of H2AK119 Mono-Ubiquitination in Biology and Disease

May 20, 2026

Powerful Genetic Mutation Surpasses Female Protective Mechanisms in Autism

May 20, 2026

POPULAR NEWS

  • blank

    New Study Reveals Plants Can Detect the Sound of Rain

    733 shares
    Share 292 Tweet 183
  • Research Indicates Potential Connection Between Prenatal Medication Exposure and Elevated Autism Risk

    846 shares
    Share 338 Tweet 212
  • ESMO 2025: mRNA COVID Vaccines Enhance Efficacy of Cancer Immunotherapy

    296 shares
    Share 118 Tweet 74
  • Breastmilk Balances E. coli and Beneficial Bacteria in Infant Gut Microbiomes

    58 shares
    Share 23 Tweet 15

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Evaluating AI Detection Tools: Researchers Investigate Effectiveness and Risks

Groundbreaking Canadian Clinical Trial Explores “Poop Pills” to Boost Lung Cancer Immunotherapy

Combining Self-Report, Language, and Body Posture to Measure Shame Could Enhance Clinical Assessments

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 82 other subscribers
  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.