The international team of the ENCODE, or Encyclopedia Of DNA Elements project, has created an overview of its ongoing large-scale efforts to interpret the human genome sequence.
The April 19 publication of “A User’s Guide to the Encyclopedia of DNA Elements (ENCODE)” in the journal PLoS Biology provides a guide for using the vast amounts of high-quality data and resources produced so far by the project. All of the data, tools to study them, and the paper itself are freely available through multiple websites accessible through encodeproject.org.
“This project requires collaboration from multiple people all over the world at the cutting edge of their fields, working in a coordinated manner to figure out the function of our human genome,” said Dr. Richard Myers, president and director of the HudsonAlpha Institute for Biotechnology and one of the 25 principal investigators of the project. “The importance extends beyond basic knowledge of who and what we are as humans and into understanding of human health and disease.”
The publication demonstrates how ENCODE data can be immediately useful in interpreting associations between single nucleotides and disease, using examples such as the c-Myc gene and cancer. Similar studies are now possible for the thousands of variants identified in genome-wide association studies, addressing mechanistic questions of susceptibility to disease.
Dr. Ewan Birney, senior scientist at the European Bioinformatics Institute and another principal investigator, commented “We knew four years ago, from our publication of ENCODE techniques on 1 percent of the genome, that we had an unprecedented view of how biology works on those regions. By extending our work to the entire genome, we see the immediate impact on the interpretation of noncoding variants identified in genome-wide association studies. These studies are disease-driven but have not always yielded clear next steps, and ENCODE data provide those scientists with some new paths to follow.”
Scientists with the ENCODE Project are applying up to 20 different tests in 108 commonly used cell lines to compile these important data. The current paper not only tells how to find the data, but also explains how to apply the data to interpret the human genome.
One can think of determining the human DNA sequence alone as finding a new language, but without a key to interpret the letters within. The ENCODE project adds data such as where RNA is produced from our DNA, where proteins bind to DNA, and where parts of our DNA are augmented by additional chemical markers. These proteins and chemical additions are keys to understanding how different cells within our bodies are interpreting the language of DNA.
Source: HudsonAlpha via EurekAlert