Departments of Genome Sciences and Medicine, University of Washington School of Medicine, Seattle, Washington 98195, USA.
Genome Res. 2012 Sep;22(9):1602-11. doi: 10.1101/gr.146506.112.
In its first production phase, The ENCODE Project Consortium (ENCODE) has generated thousands of genome-scale data sets, resulting in a genomic "parts list" that encompasses transcripts, sites of transcription factor binding, and other functional features that now number in the millions of distinct elements. These data are reshaping many long-held beliefs concerning the information content of the human and other complex genomes, including the very definition of the gene. Here I discuss and place in context many of the leading findings of ENCODE, as well as trends that are shaping the generation and interpretation of ENCODE data. Finally, I consider prospects for the future, including maximizing the accuracy, completeness, and utility of ENCODE data for the community.
在其第一生产阶段,ENCODE 项目联盟(ENCODE)已经生成了数千个基因组规模的数据组,产生了一个基因组“零件清单”,其中包括转录本、转录因子结合位点和其他功能特征,现在这些特征有数百万个不同的元素。这些数据正在重塑许多关于人类和其他复杂基因组的信息含量的长期观念,包括基因的定义本身。在这里,我讨论了 ENCODE 的许多主要发现,并探讨了正在形成的 ENCODE 数据的生成和解释的趋势。最后,我考虑了未来的前景,包括为社区最大化 ENCODE 数据的准确性、完整性和实用性。