• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Sunday, August 17, 2025
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Science

Amid genomic data explosion, scientists find proliferating errors

Bioengineer by Bioengineer
April 30, 2019
in Science
Reading Time: 3 mins read
0
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

IMAGE

Credit: WSU

Pullman, Wash. – Washington State University researchers found a troubling number of errors in publicly available genomic data as they conducted a large-scale analysis of protein sequences.

The work, published in the journal Frontiers in Microbiology, the world’s most cited microbiology journal, could have important implications for future genomic research.

The interdisciplinary team of scientists initially set out to find evidence of a minimal set of proteins that a Proteobacteria needs for survival. Their dataset consisted of nearly nine million protein sequences clustered by similarity from more than 2,300 bacterial genomes.

A genome is the complete set of genes in a cell or organism, and the genes provide instructions for building the proteins that make up all organisms.

As they searched through the massive dataset for four specific proteins thought to be part of a minimal genome for Proteobacteria, they discovered that only one of the four proteins they were looking for was shared by all the bacteria. They also found large numbers of errors in the publicly available data.

“We found that for each of the proteins, there were mistakes in annotation of their genes, which resulted in truncated or missing sequences,” said Shira Broschat, a professor in the School of Electrical Engineering and Computer Science.

The immense amounts of data being created by next-generation sequencing technologies make the kind of annotation errors the WSU team found especially problematic, said Svetlana Lockwood, lead author on the paper and a PhD graduate in computer science from WSU.

“A single annotation error can propagate rapidly because scientists build on previous annotation when they sequence new genomes,” she said.

While it took 13 years and $2.7 billion to sequence the human genome as part of the Human Genome Project in 2003, that same work can now be done in a single hour for less than $1500.

“Just in the last two years, researchers have sequenced more than twice the number of bacterial genomes as they did in the twenty years before that,” Broschat said.

While this isn’t the first paper to note the existence of annotation errors, the WSU team’s work lists and explains the various kinds of annotation errors that are currently found in the genomic sequencing data.

“With the scale of mis-annotation we found, researchers have to reevaluate the reliability of publicly available genome data for use in big data applications,” Broschat said.

According to Kelly Brayton, a professor in the Department of Veterinary Microbiology and Pathology, the errors are due to human and technological factors. Errors often happen because of imperfect DNA sequencing technology, which provides the information on the base pairs in DNA segments. They can also occur due to confusion and lack of knowledge about the proteins as well.

The team used state-of-the-art software and a high performance computing cluster on the PNNL campus to work on their dataset, the largest of its kind analyzed to date. The data was collected from a database provided by the National Center for Biotechnology Information, part of the United States National Library of Medicine, the world’s largest medical library, and the work was funded by the National Science Foundation.

Broschat and Brayton are now working on a tool to find annotation errors in biological datasets, which would be of great use to anyone working in the life sciences.

###

Media Contact
Shira Broschat
[email protected]

Related Journal Article

http://dx.doi.org/10.3389/fmicb.2019.00383

Tags: BacteriologyBioinformaticsBiologyComputer ScienceGenesGeneticsMicrobiologyTechnology/Engineering/Computer Science
Share12Tweet8Share2ShareShareShare2

Related Posts

Five or more hours of smartphone usage per day may increase obesity

July 25, 2019
IMAGE

NASA’s terra satellite finds tropical storm 07W’s strength on the side

July 25, 2019

NASA finds one burst of energy in weakening Depression Dalila

July 25, 2019

Researcher’s innovative flood mapping helps water and emergency management officials

July 25, 2019
Please login to join discussion

POPULAR NEWS

  • blank

    Molecules in Focus: Capturing the Timeless Dance of Particles

    140 shares
    Share 56 Tweet 35
  • Neuropsychiatric Risks Linked to COVID-19 Revealed

    79 shares
    Share 32 Tweet 20
  • Modified DASH Diet Reduces Blood Sugar Levels in Adults with Type 2 Diabetes, Clinical Trial Finds

    59 shares
    Share 24 Tweet 15
  • Predicting Colorectal Cancer Using Lifestyle Factors

    47 shares
    Share 19 Tweet 12

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Mpox Virus Impact in SIVmac239-Infected Macaques

Epigenetic Mechanisms Shaping Thyroid Cancer Therapy

Seismic Analysis of Masonry Facades via Imaging

  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.