• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Tuesday, November 4, 2025
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Chemistry

Researchers propose new and more effective model for automatic speech recognition

Bioengineer by Bioengineer
September 2, 2022
in Chemistry
Reading Time: 4 mins read
0
Integrating pre-training for Acoustic Speech Recognition models
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

Popular voice assistants like Siri and Amazon Alexa have introduced automatic speech recognition (ASR) to the wider public. Though decades in the making, ASR models struggle with consistency and reliability, especially in noisy environments. Chinese researchers developed a framework that effectively improves the performance of ASR for the chaos of everyday acoustic environments.

Integrating pre-training for Acoustic Speech Recognition models

Credit: CAAI Artificial Intelligence Research, Tsinghua University Press

Popular voice assistants like Siri and Amazon Alexa have introduced automatic speech recognition (ASR) to the wider public. Though decades in the making, ASR models struggle with consistency and reliability, especially in noisy environments. Chinese researchers developed a framework that effectively improves the performance of ASR for the chaos of everyday acoustic environments.

 

Researchers from the Hong Kong University of Science and Technology and WeBank proposed a new framework – phonetic-semantic pre-training (PSP) and demonstrated the robustness of their new model against synthetic highly noisy speech datasets.

 

Their study was published in CAAI Artificial Intelligence Research on Aug. 28.

 

“Robustness is a long-standing challenge for ASR,” said Xueyang Wu from the Hong Kong University of Science and Technology Department of Computer Science and Engineering. “We want to increase the robustness of the Chinese ASR system with a low cost.”

 

ASR uses machine-learning and other artificial intelligence techniques to automatically translate speech into text for uses like voice-activated systems and transcription software. But new consumer-focused applications increasingly call for voice recognition to work better — handle more languages and accents, and perform more reliably in real-life situations like video conferencing and live interviews.

 

Traditionally, training the acoustic and language models that comprise ASR requires large amounts of noise-specific data, which can be time- and cost-prohibitive.

 

The acoustic model (AM) turns words into a “phones,” which are sequences of basic sounds. The language model (LM) decodes phones into natural-language sentences, usually with a two-step process: a fast but relatively weak LM generates a set of sentence candidates, and a powerful but computationally expensive LM selects the best sentence from the candidates.

 

“Traditional learning models are not robust against noisy acoustic model outputs, especially for Chinese polyphonic words with identical pronunciation,” Wu said. “If the first pass of the learning model decoding is incorrect, it is extremely hard for the second pass to make it up.”

 

The newly proposed framework PSP makes it easier to recover misclassified words. By pre-training a model that translates the AM outputs directly to sentence along with the full context information, researchers can help the LM efficiently recover from the noisy outputs of the AM.

 

The PSP framework allows the model to improve through a pre-training regime called noise-aware curriculum that gradually introduces new skills, starting easy and gradually moving into more complex tasks.

 

“The most crucial part of our proposed method, Noise-aware Curriculum Learning, simulates the mechanism of how human beings recognize a sentence from noisy speech,” Wu said.  

 

Warm-up is the first stage, where researchers pre-train a phone-to-word transducer on a clean phone sequence, which is translated from unlabeled text data only — to cut back on the annotation time. This stage “warms up” the model, initializing the basic parameters to map phone sequences to words.

 

In the second stage, self-supervised learning, the transducer learns from more complex data generated by self-supervised training techniques and functions. Finally, the resultant phone-to-word transducer is fine-tuned with real-world speech data.

 

The researchers experimentally demonstrated the effectiveness of their framework on two real- life datasets collected from industrial scenarios and synthetic noise. Results showed that the PSP framework effectively improves the traditional ASR pipeline, reducing the relative character error rates by 28.63% for the first dataset and 26.38% for the second.

 

In next steps, researchers will investigate more effective PSP pre-training methods with larger unpaired datasets, seeking to maximize the effectiveness of pretraining for noise-robust LM.

 

Other contributors include Rongzhong Lian, Di Jiang, Yuanfeng Song, Weiwei Zhao, and Qian Xu, and Qiang Yang from WeBank Co. Ltd. Qian Xu and Qiang Yang are also affiliated with The Hong Kong University of Science and Technology.

 

CAAI Artificial Intelligence Research is a new journal jointly sponsored by Chinese Association for Artificial Intelligence (CAAI) and Tsinghua University. This is the first paper published in the journal.

 

##

 

About CAAI Artificial Intelligence Research 

 

CAAI Artificial Intelligence Research is a peer-reviewed journal jointly sponsored by Chinese Association for Artificial Intelligence (CAAI) and Tsinghua University. The journal aims to reflect the state-of-the-art achievement in the field of artificial intelligence and its application, including knowledge intelligence, perceptual intelligence, machine learning, behavioral intelligence, brain and cognition, and AI chips and applications, etc. Original research and review articles from all over the world are welcome for rigorous peer-review and professional publishing support.

 

About SciOpen 

 

SciOpen is a professional open access resource for discovery of scientific and technical content published by the Tsinghua University Press and its publishing partners, providing the scholarly publishing community with innovative technology and market-leading capabilities. SciOpen provides end-to-end services across manuscript submission, peer review, content hosting, analytics, and identity management and expert advice to ensure each journal’s development by offering a range of options across all functions as Journal Layout, Production Services, Editorial Services, Marketing and Promotions, Online Functionality, etc. By digitalizing the publishing process, SciOpen widens the reach, deepens the impact, and accelerates the exchange of ideas.

 



DOI

10.26599/AIR.2022.9150001

Article Title

A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition

Article Publication Date

28-Aug-2022

Share12Tweet8Share2ShareShareShare2

Related Posts

blank

Parkinson’s Mouse Model Reveals How Noise Impairs Movement

November 4, 2025
Innovative Smart Hydrogel Emulates Skin Repair, Accelerating Healing of Diabetic Wounds

Innovative Smart Hydrogel Emulates Skin Repair, Accelerating Healing of Diabetic Wounds

November 4, 2025

Chemoenzymatic Synthesis of Lariat Lipopeptides Revolutionized

November 4, 2025

PKU Scientists Reveal Climate Effects and Future Patterns of Hailstorms in China

November 4, 2025

POPULAR NEWS

  • Sperm MicroRNAs: Crucial Mediators of Paternal Exercise Capacity Transmission

    1298 shares
    Share 518 Tweet 324
  • Stinkbug Leg Organ Hosts Symbiotic Fungi That Protect Eggs from Parasitic Wasps

    313 shares
    Share 125 Tweet 78
  • ESMO 2025: mRNA COVID Vaccines Enhance Efficacy of Cancer Immunotherapy

    205 shares
    Share 82 Tweet 51
  • New Study Suggests ALS and MS May Stem from Common Environmental Factor

    138 shares
    Share 55 Tweet 35

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Unraveling How Sugars Influence the Inflammatory Disease Process

Parkinson’s Mouse Model Reveals How Noise Impairs Movement

Demographic Changes May Drive Rise in Drug-Resistant Infections Across Europe

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 67 other subscribers
  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.