New Insights From ENCODE Datasets, NIH Issues Finalized Genomic Data Sharing Policy

September 2, 2014

A series of papers have been published in the journal Nature in which researchers conducted detailed comparisons of the worm, fly, and human genomes using data from the ENCODE and modENCODE projects.

Comparative Genomics and Gene Regulation

Naturein which researchers conducted detailed comparisons of the worm, fly, and human genomes using data from the Encyclopedia Of DNA Elements (ENCODE) and model organism ENCODE (modENCODE) projects. At the same time that these collaborations are dramatically increasing the amount of genetic data available, the NIH has issued its final policy on genomic data sharing (GDS) to promote the sharing of information, which may lead to advances in treatments, while also protecting research participants’ privacy.Though much of current research into targeted therapies for cancer focuses on protein-coding mutations in oncogenes, it is suspected that regulatory mutations also play a key role. Functional investigations of regulatory features present greater challenges than those of coding regions, but 3 reports inNatureuse comparative genomics to shed light on these murky regions of the genome.

The National Human Genome Research Institute (NHGRI) funds both the ENCODE project on the human genome and the complementary modENCODE project, which investigates the fly and worm genomes. The goal of the research groups in these 2 consortia is to produce a comprehensive list of functional elements of these genomes including protein- and RNA-coding regions and regulatory elements.

One of the studies investigated chromatin organization and associated gene regulation across the three species and discovered features shared between the human, fly, and worm species, including histone-modification patterns.1The results of another study detailed comparisons of genomic binding sites for transcription-regulatory factors across the 3 species.2

The authors of a third paper combined results on chromatin features with transcriptional data and developed a “universal model” that can quantitatively predict the expression of both coding and non-coding genes based on chromatin features.3

Genomic Data Sharing

“One way to describe and understand the human genome is through comparative genomics and studying model organisms,” noted lead author, Mark Gerstein, PhD, Albert L. Williams Professor of Biomedical Informatics at Yale University in New Haven, Connecticut, in a statement.4“The special thing about the worm and fly is that they are very distant from humans evolutionarily, so finding something conserved across all three — human, fly and worm — tells us it is a very ancient, fundamental process.”The rapid expansion in the volume and complexity of genomic data being produced and the increasing sophistication of bioinformatics have highlighted the risks to the privacy of genetic research participants. The NIH has finalized its GDS policy with the goal of protecting individuals’ genomic data while not hindering investigations involving genomic data such as cancer biomarker research.

The policy is based on and expands requirements introduced with the Genome-Wide Association Studies (GWAS) data sharing policy implemented in 2007. It will apply to all NIH-funded, large-scale projects that produce genomic data from human and non-human sources starting with grant applications submitted after January 25, 2015.

The policy builds on the database for Genotypes and Phenotypes (dbGaP) that the NIH created following implementation of the GWAS policy. Under this system, some genomic information is made available to the public without restrictions, while other data are released only if the proposed research use is consistent with the study participants’ original consent.

“Advances in DNA sequencing technologies have enabled NIH to conduct and fund research that generates ever-greater volumes of GWAS and other types of genomic data,” noted Eric Green, MD, PhD, NHGRI director, report co-author and a co-chair of the trans-NIH committee that developed the GDS policy, in a statement. “Access to these data through dbGaP and according to the data management practices laid out in the policy allows researchers to accelerate research by combining and comparing large and information-rich datasets.”5


In addition to expanding the two-tiered system for data access to human-derived cell lines and clinical specimens, other important aspects of the new policy include encouraging investigators to: seek the broadest possible consent from participants to use their data, make data available for access in a timely manner, and use data to promote maximum public benefit.

  1. Ho JW, Jung YL, Liu T, Alver BH, et al. Comparative analysis of metazoan chromatin organization.Nature. 2014;512:449-452.
  2. Boyle AP, Araya CL, Brdlik C, Cayting P, et al. Comparative analysis of regulatory information and circuits across distant species.Nature. 2014;512:453-456.
  3. Gerstein MB, Rozowsky J, Yan KK, Wang D, et al. Comparative analysis of the transcriptome across distant species.Nature. 2014;512(7515):445-448.
  4. Scientists looking across human, fly and worm genomes find shared biology. NIH website. Accessed August 30, 2014.
  5. NIH issues finalized policy on genomic data sharing. NIH website. Accessed August 30, 2014.