• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Sunday, September 14, 2025
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Biology

GA4GH streaming API htsget a bridge to the future for modern genomic data processing

Bioengineer by Bioengineer
June 22, 2018
in Biology
Reading Time: 3 mins read
0
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

HINXTON, United Kingdom (June 22, 2018) — The Large Scale Genomics Work Stream of the Global Alliance for Genomics and Health (GA4GH) has announced eight new implementations of its htsget protocol, a standard released in October 2017 for accessing large-scale genomic sequencing data online without using file transfers. The protocol and interoperability testing are reported in a paper released online this week in the journal Bioinformatics.

The cornerstone of solving common diseases such as cancer and diabetes is to be able to compare the genome sequences of thousands of individuals to identify recurring genetic variants. Since no single institution can amass such a dataset on its own, it is critical for organizations to share information across traditional boundaries.

Historically, this has been done through the use of standardised file formats: a file generated at one institution can be downloaded and integrated with files at another institution because they use the same format.

This has worked well since the late 2000s, when these formats were developed as part of the international 1000 Genomes Project and they have enabled a global ecosystem of interoperable sequence analysis tools and pipelines.

But the field is changing. Genomics is shifting from a research endeavor to one more broadly implemented in routine clinical care; datasets will be so large that the current model of institutionally siloed file systems will not be sufficient to enable global sharing and collaboration.

"Datasets containing hundreds of millions — rather than hundreds of thousands — of sequences will be available within the next five years and sharing files of that size is simply not realistic," said Ewan Birney, Director of EMBL-EBI and Chair of GA4GH. "Users would have to download terabyte-sized files just to access data on a small subset of the genome sequence."

At the same time, the world is changing — from film to financial data, myriad domains are shifting from traditional file-based approaches for storing and processing data to more modern, big-data, cloud-based approaches. Genomics will have to follow suit, but not without sacrificing current standards that make data interoperable.

"We're not attempting to replace the existing file formats," said Thomas Keane, Team Leader of EGA and the Archive Infrastructure at EMBL-EBI and co-chair of the GA4GH Large Scale Genomics Work Stream and its htsget task team. "Doing so would require adaptation of every single bioinformatics tool for processing data that is currently compatible with those formats."

Instead, htsget provides a consistent protocol for researchers to access data stored in different repositories — whether based in big public clouds or in more traditional infrastructure. It also includes a robust security and authentication mechanism, which is key for sensitive data.

It can be operated efficiently for very large datasets, and, because it uses the existing standards for transmitting data, it can be readily integrated into current pipelines and analytical methods. Users can employ htsget to download only the subsection of a genome sequence in which they are interested rather than the whole file, or they can download the entire genome as a series of "data slices" distributed across multiple disparate machines.

"We've thought of this as a bridge to the future," said Mike Lin, specification maintainer for the GA4GH htsget team. "It's a gradual path to upgrade current file-based pipelines and repositories to a more interoperable, API-based architecture — which has always been a foundational vision of GA4GH."

Lin will lead a webinar introducing the protocol and answering questions about implementation for the broad community on July 24. Anyone interested in learning more about htsget and how to implement it in their bioinformatics pipelines is invited to attend. Register here.

Eight implementations of htsget are now online:

Servers:

European Genome-phenome Archive (EGA), European Bioinformatics Institute (EMBL-EBI) and the Centre for Genomic Regulation (CRG)

Google Cloud Platform

DNAnexus

Wellcome Sanger Institute

Clients:

HTSlib C library, maintained at the Wellcome Sanger Institute and EMBL-EBI

htsget, maintained at the Wellcome Centre for Human Genetics

Browsers:

Integrated Genome Viewer (IGV) at the Broad Institute of MIT and Harvard

Biodalliance

###

The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 500+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data. All of our work builds upon the Framework for Responsible Sharing of Genomic and Health-Related Data.

Media Contact

Angela Page
[email protected]
617-714-8048
@GA4GH

www.genomicsandhealth.org

https://www.ga4gh.org/news/-9msBlhISDK_ltjA7Vt6aA.article

Related Journal Article

http://dx.doi.org/10.1093/bioinformatics/bty492

Share12Tweet8Share2ShareShareShare2

Related Posts

blank

Extraction Methods Impact Idesia Polycarpa Oil Quality

September 13, 2025

Evaluating Rohu Fry Transport: Key Water Quality Insights

September 13, 2025

Unveiling Arabidopsis Aminotransferases’ Multi-Substrate Specificity

September 13, 2025

Evaluating Energy Digestibility in Quail Feed Ingredients

September 12, 2025
Please login to join discussion

POPULAR NEWS

  • blank

    Breakthrough in Computer Hardware Advances Solves Complex Optimization Challenges

    153 shares
    Share 61 Tweet 38
  • New Drug Formulation Transforms Intravenous Treatments into Rapid Injections

    116 shares
    Share 46 Tweet 29
  • Physicists Develop Visible Time Crystal for the First Time

    65 shares
    Share 26 Tweet 16
  • A Laser-Free Alternative to LASIK: Exploring New Vision Correction Methods

    49 shares
    Share 20 Tweet 12

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Maize Fungal Diseases: Pathogen Diversity in Ethiopia

Unraveling Gut Microbiota’s Role in Breast Cancer

Estimating Rice Canopy LAI Non-Destructively Across Varieties

  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.