NIH-funded resource merges pathogen databases and adds AI capabilities
Credit: PATRIC/University of Chicago
The most valuable weapon against the next deadly disease outbreak may be data. Scientists aiming to stop or prevent the spread of viral or bacterial pathogens need rapid, comprehensive access to datasets on their genomics, structure, function and more, combined with computational tools to quickly analyze data and make predictions using artificial intelligence techniques.
That critical service will be provided by the new Bacterial and Viral Bioinformatics Resource Center (BV-BRC), based at the University of Chicago. The big data resource, combines two independent efforts at UChicago and the J. Craig Venter Institute (JCVI) into a common infrastructure that will support richer scientific data and more powerful analytic tools.
The data resource will be funded by the National Institutes of Health and the contract award totals $43,196,984 over five years, if all options are exercised. The primary awardee is the University of Chicago, with sub-awards given to JCVI, the University of Virginia and the Fellowship for Interpretation of Genomes.
“In the next five years, the number of sequenced bacterial and viral samples will exceed 10 million, with the amount and variety of associated data growing in proportion,” said Rick Stevens, Professor of Computer Science at UChicago, Associate Laboratory Director for Computing, Environment and Life Sciences at Argonne National Laboratory, and co-principal investigator of BV-BRC. “This new center will transform existing resources into a scalable platform for comparative bioinformatics, large-scale data analysis, integrative data mining and discovery, and artificial intelligence in support of the infectious disease research community.”
The BRC platforms enable non-bioinformatic experts to fully maximize the value of data related to the pathogens. The new BV-BRC will combine the data, technology, and extensive user communities from two long-running centers: PATRIC, the bacterial system, and IRD/ViPR, the viral systems. Currently, PATRIC hosts 200,000 bacterial genomes and IRD/ViPR hosts 1.5 million viral genomes. In addition, the two resources host data on protein structure and function, clinical studies, drug targets and resistance, epidemiology, and other features, and provide open source tools for data analysis and genomic annotation.
“The merging of the PATRIC and IRD/ViPR resources is an exciting opportunity that will allow us to combine the complementary expertise in genomics and high-performance computing of these two teams and take advantage of the latest developments in big data analytics and artificial intelligence,” said Richard Scheuermann, La Jolla Campus Director of the J. Craig Venter Institute, Adjunct Professor of Pathology at the University of California San Diego, and co-principal investigator of BV-BRC.
In addition to combining datasets and infrastructure, the BV-BRC will also enable next-generation infectious disease research by creating new datasets and tools that help researchers take advantage of cloud and high-performance computing, machine learning, and other advanced informatics approaches.
For example, researchers are now increasingly building machine learning models based on genomic data that infer pathogen function and virulence risk, identify new drug targets and mechanisms of drug resistance, or even predict clinical outcomes in patients treated with particular antibiotics or antivirals. The BV-BRC will provide data in structures optimized for machine learning approaches, create tools for researchers to construct new models, and host predictive models for use by other members of the community.
“With the vast proliferation of available data and analysis techniques, we’re excited about joining forces to give researchers, scientists, and physicians access to advanced tools, state-of-the-art methods, and critical infrastructure to accelerate the state of pathogenic research and response, as well as to begin to answer emerging questions in novel ways,” said Stevens.
This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. 75N93019C00076.