Zhang Zhengdong D, Paccanaro Alberto, Fu Yutao, Weissman Sherman, Weng Zhiping, Chang Joseph, Snyder Michael, Gerstein Mark B
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
Genome Res. 2007 Jun;17(6):787-97. doi: 10.1101/gr.5573107.
The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.
由ENCODE项目联盟开展的对44个人类基因组区域功能元件的全面清查,首次实现了对转录调控元件基因组分布的全局分析。在本研究中,我们开发了一种直观而强大的方法,用于分析在许多不同的芯片结合位点分析(ChIP-chip)实验中发现的调控元件在约10至100千碱基规模上的分布。首先,我们关注ENCODE区域中调控元件的整体染色体分布,并表明其高度不均匀。事实上,我们证明调控元件与已知基因的位置相关。在局部单基因规模上的进一步研究表明,调控元件在转录起始和终止位点附近富集。我们的结果表明,总体而言,这些元件聚集成调控丰富的“岛”和贫乏的“沙漠”。接下来,我们研究不同转录因子之间这种不均匀分布的一致性程度。我们在双标图框架内对所有因子进行多变量分析,这增强了实验中的生物学信号。这将转录因子分为序列特异性和序列非特异性簇。此外,在仔细控制实验变异的情况下,详细的相关性表明,特定因子的位点分布在不同实验室和微阵列平台之间通常是可重复的。与组蛋白修饰相关的数据集具有特别强的相关性。最后,我们展示了仅考虑远离转录起始位点的调控元件时,因子之间的相关性如何变化。