Latest findings from third phase of ENCODE project shed new light on the vastly mysterious 98 percent of genome that doesn’t include protein-coding genes
Researchers at University of California San Diego School of Medicine are among the contributors to a package of 10 studies, published July 29, 2020 in the journal Nature, describing the latest results from the ongoing Encyclopedia of DNA Elements (ENCODE) project, a worldwide effort led by the National Institutes of Health (NIH) to understand how the human genome functions.
This third series of published papers includes descriptions of millions of candidate DNA “switches” from the human and mouse genomes that appear to regulate when and where genes are turned on, a new registry that assigns a portion of these DNA switches to useful biological categories and new visualization tools to assist in the use of ENCODE’s large datasets.
To assess the potential functions of different DNA regions, ENCODE researchers studied biochemical processes that are typically associated with the switches that regulate genes. This biochemical approach is an efficient way to explore the entire genome rapidly and comprehensively. This method helps to locate regions in the DNA that are “candidate functional elements”– DNA regions that researchers predict are functional elements based on these biochemical properties. They can then test candidates in further experiments to identify and characterize their functional roles in gene regulation.
“A key challenge in ENCODE is that different genes and functional regions are active in different cell types,” said Elise Feingold, PhD, scientific advisor for strategic implementation in the Division of Genome Sciences at the National Human Genome Research Institute, part of the NIH, and a lead on ENCODE for the institute. “This means that we need to test a large and diverse number of biological samples to work towards a catalog of candidate functional elements in the genome.”
Research teams headed by Bing Ren, PhD, professor of cellular and molecular medicine, director of the Center for Epigenomics at UC San Diego School of Medicine and a member of the San Diego branch of the Ludwig Institute for Cancer Research, and Gene Yeo, PhD, professor of cellular and molecular medicine at UC San Diego School of Medicine, each published a study in the latest issue of Nature.
The laboratory mouse is widely used in biomedical research to model human biology and disease. One key advantage of the mouse is that researchers have access to stages of fetal development that are difficult or impossible to study in humans. In the study by Ren, with first author David Gorkin, PhD, and colleagues, scientists profiled the chromatin structure in 12 different tissues over multiple stages of development in the mouse, revealing information about the activity of underlying genes during an organism’s development. They then used the maps to define more than 500,000 candidate transcriptional regulatory elements in the mouse genome, which serve to control when and where the more than 20,000 genes in the mouse genome are activated during fetal development.
Their results have two major implications for human disease research, said Ren.
“First, we found that genetic risk variants for a variety of human diseases are located in regions of the human genome that share evolutionary origins with the mapped mouse gene regulatory sequences, suggesting that these regions of the human genome may be functional in humans, and that some of the disease risk variants in them may be affecting fetal development.
“Second, our results provide important tools for researchers using mice to model human disease, helping them focus on specific tissues, developmental stages and regions of the genome that may be most relevant to their targeted disease.”
In Yeo’s study, with first author Eric L. Van Nostrand, PhD, and colleagues, researchers introduce a new set of regulatory elements embedded in human DNA that are recognized by RNA-binding proteins that interact with these elements only when they are transcribed into RNA. The study authors write that these RNA-binding proteins act as computers that read these RNA elements and act on their instructions to “process” RNA, such as trimming excess pieces (splicing) or adding “tails” to stabilize messages.
The work represents the most comprehensive dataset yet of RNA-binding protein-interactomes.
Yeo said the findings are significant because they reveal novel functions for many RNA-binding proteins and identify genetic elements that will serve as the foundation to decipher which human mutations cause misregulated or failed RNA “processing.” The Yeo team has already leveraged the information to create principles of regulation by these proteins, highlighting the importance of RNA research that will aid new therapeutic arenas.
“I am astounded by the complexity of regulation by these RNA-binding proteins and I believe our new datasets will open the doors to every field of disease, genetics and molecular biology,” Yeo said.
Of note: More than two dozen UC San Diego faculty or researchers with UC San Diego affiliations are listed among the co-authors of the 10 published papers in the latest ENCODE issue of Nature. Full contents of the journal can be found here.
Significant progress has been made in characterizing protein-coding genes, which comprise less than 2 percent of the human genome. Researchers know much less about the remaining 98 percent of the genome, including how much and which parts of it perform other functions. ENCODE was created to help fill in this knowledge gap.
The human body is composed of trillions of cells, with thousands of types of cells. While all these cells share a common set of DNA instructions, the diverse cell types, such as heart, lung and brain, carry out distinct functions by using the information encoded in DNA differently. The DNA regions that act as switches to turn genes on or off, or to tune the exact levels of gene activity, help drive the formation of distinct cell types in the body and govern their functioning in health and disease.
As a new feature, ENCODE 3 researchers created a resource detailing different kinds of DNA regions and their corresponding candidate functions. A web-based tool called SCREEN allows users to visualize the data supporting these interpretations.
The ENCODE Project began in 2003 and is an extensive collaborative research effort involving groups across the United States and around the world, comprising more than 500 scientists with diverse expertise. It has benefited from and built upon decades of research on gene regulation performed by independent researchers around the world.
ENCODE researchers have created a community resource, ensuring that the project’s data is accessible to any researcher for their studies. These efforts in open science have resulted in more than 2,000 publications from non-ENCODE researchers who used data generated by the ENCODE Project.
“This demonstrates that the encyclopedia is widely used, which is what we had always aimed for,” said Feingold. “Many of these publications are related to human disease, attesting to the resource’s value for relating basic biological knowledge to health research.”