The National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), today announced grants totaling more than $80 million over the next four years to expand the ENCyclopedia Of DNA Elements (ENCODE) project, which in its pilot phase yielded provocative new insights into the organization and function of the human genome.
“Based on ENCODE’s early success, we are moving forward with a full-scale initiative to build a parts list of biologically functional elements in the human genome,” said NHGRI Director Francis S. Collins, M.D., Ph.D. “The ENCODE pilot, which looked at just 1 percent of the human genetic blueprint, produced findings that are reshaping many long-held views about our genome. ENCODE’s effort to survey the entire genome will uncover even more exciting surprises, providing us with a more complete picture of the biological roots of human health and disease.”
While the sequencing of the human genome was a major scientific achievement, it was just the first step toward the ultimate goal of using genomic information to diagnose, treat and prevent disease. In recent years, researchers have made major strides in using DNA sequence data to help find genes, which are the parts of the genome that code for proteins. The protein-coding component of these genes, however, makes up just a small fraction of the human genome — about 1.5 percent. There is strong evidence that other parts of the genome have important functions, but very little information exists about where these other functional elements are located and how they work. The ENCODE project aims to address this critical goal of genomics research.
In June, the ENCODE research consortium published a set of landmark papers in the journals Nature and Genome Research that found the organization, function and evolution of the genome to be far more complicated than most had suspected. For example, while researchers have traditionally focused on studying genes and their associated proteins, the ENCODE data indicate the genome is a very complex, interwoven network in which genes are just one of many types of DNA sequences with functional impact.
“We learned many valuable lessons from the ENCODE pilot project. Among them was the importance of scientific teamwork,” said Elise A. Feingold, Ph.D., program director for ENCODE in NHGRI’s Division of Extramural Research. “Following the pilot’s strong example of multi-disciplinary collaboration, we are confident that the scaled-up ENCODE team will succeed in its quest to build a comprehensive catalog of the components of the human genome that are crucial to biological function.”
In addition to the research grants to support expansion of the ENCODE project, NHGRI also announced awards today for two pilot-scale projects, the establishment of an ENCODE data coordination center, and six projects to develop novel methods and technologies aimed at helping the ENCODE project achieve its goals.
“As was the case for the Human Genome Project and the ENCODE pilot, all of the data generated by the full-scale ENCODE project will be deposited into public databases as soon as they are experimentally verified,” said Peter Good. Ph.D., program director for genome informatics in NHGRI’s Division of Extramural Research. “Free and rapid access to this data will enable researchers around the world to pose new questions and gain new insights into how the human genome functions.”
The principal investigators chosen to receive the ENCODE scale-up grants are:
- Bradley Bernstein, M.D., Ph.D.; Broad Institute of MIT and Harvard, Cambridge, Mass.; $4.8 million (four years); High-Throughput Sequencing of Chromatin Regulatory Elements. Utilizing the technique of chromatin immunoprecipitation followed by high-throughput DNA sequencing, this team will map modifications of histones in various types of human cells. Histones are proteins that play a key role in DNA packaging.
- Gregory Crawford, Ph.D.; Duke University Institute for Genome Sciences & Policy, Durham, N.C.; $6.5 million (four years); Comprehensive Identification of Active Functional Elements in Human Chromatin. These researchers will seek to identify and characterize regions of open chromatin through DNase I hypersensitivity assays, formaldehyde-assisted isolation of regulatory elements and chromatin immunoprecipitation for a few key DNA-binding factors. Chromatin is the complex of DNA and proteins that makes up chromosomes.
- Thomas Gingeras, Ph.D.; Affymetrix Inc., Santa Clara, Calif.; $10.2 million (four years); Comprehensive Characterization and Classification of the Human Transcriptome. This group will identify protein-coding and non-protein-coding ribonucleic acid (RNA) transcripts using microarrays, high-throughput sequencing, sequenced paired-end ditags and sequenced cap analysis of gene expression tags. RNA is an information molecule vital to a number of biological functions, including protein production.
- Tim Hubbard, Ph.D.; Wellcome Trust Sanger Institute, Hinxton, England; $8.5 million (four years); Integrated Human Genome Annotation: Generation of a Reference Gene Set. Using computational methods, manual annotation and targeted experiments, this team will annotate gene features in the human genome. Such features include genes that code for proteins; genes that are transcribed, but do not code for proteins; and pseudogenes, which are DNA sequences similar to normal genes, but which have been altered slightly so they are not functional.
- Richard Myers, Ph.D.; Stanford University, Stanford, Calif.; $14.6 million (four years); Global Annotation of Regulatory Elements in the Human Genome. This group has two goals: to identify transcription factor binding sites by using chromatin immunoprecipitation followed by high-throughput sequencing, and to pilot the use of high-throughput sequencing to determine the methylation status of CpG-rich regions of the human genome. Transcription factors are proteins and enzymes that initiate the transcription of a gene’s DNA sequence into RNA. Methylation refers to a specific chemical modification of DNA, which can silence or reduce the activity of the affected region of DNA.
- Michael Snyder, Ph.D.; Yale University, New Haven, Conn., $11.5 million (four years); Production Center for Global Mapping of Regulatory Elements. These researchers will identify transcription factor binding sites in the human genome using chromatin immunoprecipitation, followed by high-throughput sequencing.
- John Stamatoyannopoulos, M.D.; University of Washington, Seattle; $9.7 million (four years); A Comprehensive Catalog of Human DNase I Hypersensitive Sites. This team will map and functionally classify DNase I hypersensitive sites across major human cell lineages. It will do this using digital DNase I and histone modification mapping by high-throughput sequencing. DNAse I is an enzyme that cleaves DNA at sites where it is exposed by regulatory proteins. DNase I hypersensitive sites mark the location of regulatory elements in the human genome.
The principal investigators chosen to receive the ENCODE pilot-scale grants are:
- Scott Tenenbaum, Ph.D.; University at Albany-State University of New York; $2.2 million (three years); Comprehensive Identification of ENCODE RNA-based, Cis-regulatory Elements. In this pilot project, researchers will strive to identify sites that are targets for RNA-binding proteins through immunoprecipitation coupled with microarrays and high-throughput sequencing.
- Zhiping Weng, Ph.D.; Boston University; $1.5 million (three years); Identification of Transcriptional Factor-Binding Sites in Human Promoters. This pilot project will aim to computationally predict transcription factor binding sites that determine the activities of promoters. Promoters are regions of DNA that serve as binding sites for proteins that guide the initiation of transcription of genes.
The Data Coordination Center for ENCODE will be led by:
- W. James Kent, Ph.D.; University of California, Santa Cruz; $5 million (four years); The UCSC ENCODE Data Coordination Center. This group will collect, organize, store, manage and provide access to data from ENCODE and related projects.
The principal investigators chosen to receive technology development grants are:
- Howard Chang, M.D., Ph.D.; Stanford University, Stanford, Calif.; $1.3 million (three years); Structural Motifs in RNA. These researchers will develop high-throughput methods to predict functional motifs in RNA, to map RNA structure and to assign biological functions to RNA motifs.
- Michael Dorschner, Ph.D.; University of Washington, Seattle; $1.1 million (three years); High-Definition In Vivo Footprinting via Single Molecule Sequencing. This group’s goal is to develop an in vivo method that utilizes single molecule sequencing to identify sites of protein-DNA interaction by differential cleavage sensitivity with ultraviolet light, dimethylsulfate and DNase I.
- John Greally, Ph.D.; Albert Einstein College of Medicine, Bronx, N.Y.; $1.5 million (three years); Massively Parallel Sequencing Technology for the Epigenome. This team will work to develop high-throughput sequencing methods to analyze methylation of cytosine and to map histone modifications.
- Xiaoman Li, Ph.D.; Indiana University, Indianapolis; $870,000 (three years); Discovery of Cis-Regulatory Modules in the Human Genome. This team will strive to develop computational methods for identifying conserved cis-regulatory modules in non-protein coding regions of the human genome.
- Marcelo Nobrega, M.D., Ph.D.; University of Chicago; $1.5 million (three years); Generation and In Vivo Validation of Cis-regulatory Maps in Eukaryotic Genomes. The two goals of this group are: to develop tagged DNA binding proteins that are recognizable by tag-specific antibodies for use in mapping binding sites for a wide range of proteins, and to develop platforms to test predicted enhancers, silencers and insulators in the human genome.
- Yijun Ruan, Ph.D.; Genome Institute of Singapore; $990,000 (three years); Whole Genome Chromatin Interaction Analysis Using Paired-End diTagging. This team will develop methods to characterize long-range chromatin interactions involved in transcription using high-throughput sequencing.