• HOME
  • NEWS
    • BIOENGINEERING
    • SCIENCE NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • FORUM
    • INSTAGRAM
    • TWITTER
  • CONTACT US
Wednesday, June 29, 2022
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
    • BIOENGINEERING
    • SCIENCE NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • FORUM
    • INSTAGRAM
    • TWITTER
  • CONTACT US
  • HOME
  • NEWS
    • BIOENGINEERING
    • SCIENCE NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • FORUM
    • INSTAGRAM
    • TWITTER
  • CONTACT US
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Health

Sounds familiar: A speaker identity-controllable framework for machine speech translation

Bioengineer by Bioengineer
April 26, 2021
in Health
0
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

Researchers propose a deep learning-based model for mimicking and continuously modifying speaker voice identity during speech translation

IMAGE

Credit: Masato Akagi

Ishikawa, Japan – Robots today have come a long way from their early inception as insentient beings meant primarily for mechanical assistance to humans. Today, they can assist us intellectually and even emotionally, getting ever better at mimicking conscious humans. An integral part of this ability is the use of speech to communicate with the user (smart assistants such as Google Home and Amazon Echo are notable examples). Despite these remarkable developments, they still do not sound very “human”.

This is where voice conversion (VC) comes in. A technology used to modify the speaker identity from one to another without altering the linguistic content, VC can make the human-machine communication sound more “natural” by changing the non-linguistic information, such as adding emotion to speech. “Besides linguistic information, non-linguistic information is also important for natural (human-to-human) communication. In this regard, VC can actually help people be more sociable since they can get more information from speech,” explains Prof. Masato Akagi from Japan Advanced Institute of Science and Technology (JAIST), who works on speech perception and speech processing.

Speech, however, can occur in a multitude of languages (for example, on a language-learning platform) and often we might need a machine to act as a speech-to-speech translator. In this case, a conventional VC model experiences several drawbacks, as Prof. Akagi and his doctoral student at JAIST, Tuan Vu Ho, discovered when they tried to apply their monolingual VC model to a “cross-lingual” VC (CLVC) task. For one, changing the speaker identity led to an undesirable modification of linguistic information. Moreover, their model did not account for cross-lingual differences in “F0 contour”, which is an important quality for speech perception, with F0 referring to the fundamental frequency at which vocal cords vibrate in voiced sounds. It also did not guarantee the desired speaker identity for the output speech.

Now, in a new study published in IEEE Access, the researchers have proposed a new model suitable for CLVC that allows for both voice mimicking and control of speaker identity of the generated speech, marking a significant improvement over their previous VC model.

Specifically, the new model applies language embedding (mapping natural language text, such as words and phrases, to mathematical representations) to separate languages from speaker individuality and F0 modeling with control over the F0 contour. Additionally, it adopts a deep learning-based training model called a star generative adversarial network, or StarGAN, apart from their previously used variational autoencoder (VAE) model. Roughly put, a VAE model takes in an input, converts it into a smaller and dense representation, and converts it back to the original input, whereas a StarGAN uses two competing networks that push each other to generate improved iterations until the output samples are indistinguishable from natural ones.

The researchers showed that their model could be trained in an end-to-end fashion with direct optimization of language embedding during the training and allowed good control of speaker identity. The F0 conditioning also helped remove language dependence of speaker individuality, which enhanced this controllability.

The results are exciting, and Prof. Akagi envisions several future prospects of their CLVC model. “Our findings have direct applications in protection of speaker’s privacy by anonymizing one’s identity, adding sense of urgency to speech during an emergency, post-surgery voice restoration, cloning of voices of historical figures, and reducing the production cost of audiobooks by creating different voice characters, to name a few,” he comments, excitedly. He intends to further improve upon the controllability of speaker identity in future research.

Perhaps the day is not far when smart devices start sounding even more like humans!

###

Reference

Title of original paper: Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network

Journal: IEEE Access

DOI: 10.1109/ACCESS.2021.3063519

About Japan Advanced Institute of Science and Technology, Japan

Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan’s top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry-academia collaborative research.

About Professor Masato Akagi from Japan Advanced Institute of Science and Technology, Japan

Masato Akagi is a professor at the Faculty of the School of Information Science at Japan Advanced Institute of Science and Technology (JAIST). He received his PhD degree from the Tokyo Institute of Technology, Japan in 1984 and joined JAIST in 1992. His research interests include speech perception and its modeling in humans, and the signal processing of speech. As a senior and reputed professor, he has published 456 papers with over 2500 citations to his credit. For more information about his research, visit: https://www.jaist.ac.jp/english/areas/hld/laboratory/akagi.html#page

Funding information

The study was funded by National Institute of Informatics-Center for Robust Intelligence and Social Technology (NII-CRIS), Grant-in-Aid for Scientific Research, and the Japan Society for the Promotion of Science (JSPS)-NSFC Bilateral Joint Research Projects/Seminars.

Media Contact
Masato Akagi
[email protected]

Related Journal Article

http://dx.doi.org/10.1109/ACCESS.2021.3063519

Tags: Hearing/SpeechLanguage/Linguistics/SpeechMedicine/Health
Share12Tweet7Share2ShareShareShare1

Related Posts

Atrial fibrillation after surgery is linked to an increased risk of hospitalization for heart failure

Atrial fibrillation after surgery is linked to an increased risk of hospitalization for heart failure

June 29, 2022
$5.3 million grant supports research into lung cancer recurrence

$5.3 million grant supports research into lung cancer recurrence

June 28, 2022

University of Cincinnati enrolling patients for PTSD clinical trials

June 28, 2022

Double duty: Early research reveals how a single drug delivers twice the impact in fragile X

June 28, 2022
Please login to join discussion

POPULAR NEWS

  • Pacific whiting

    Oregon State University research finds evidence to suggest Pacific whiting skin has anti-aging properties that prevent wrinkles

    37 shares
    Share 15 Tweet 9
  • University of Miami Rosenstiel School selected for National ‘Reefense’ Initiative focusing on Florida and the Caribbean

    35 shares
    Share 14 Tweet 9
  • Saving the Mekong delta from drowning

    37 shares
    Share 15 Tweet 9
  • Sharks may be closer to the city than you think, new study finds

    34 shares
    Share 14 Tweet 9

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Tags

VirusUrogenital SystemVirologyViolence/CriminalsUrbanizationZoology/Veterinary ScienceUniversity of WashingtonWeaponryVaccineVaccinesWeather/StormsVehicles

Recent Posts

  • Shedding light on reptilian health: Researchers investigate origins of snake fungal disease in U.S.
  • Dissolving the problem: Organic vapor induces dissolution of molecular salts
  • New kangaroo described – from PNG
  • Atrial fibrillation after surgery is linked to an increased risk of hospitalization for heart failure
  • Contact Us

© 2019 Bioengineer.org - Biotechnology news by Science Magazine - Scienmag.

No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

© 2019 Bioengineer.org - Biotechnology news by Science Magazine - Scienmag.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Posting....