Suppr超能文献

基于错误合并的统计方法,用于识别通过DNA平铺阵列观察到的人类染色体新的时间复制谱。

Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays.

作者信息

Park Taesung, Kim Youngchul, Bekiranov Stefan, Lee Jae K

机构信息

Department of Statistics, Seoul National University, Korea.

出版信息

Nucleic Acids Res. 2007;35(9):e69. doi: 10.1093/nar/gkm130. Epub 2007 Apr 11.

Abstract

Statistical analysis on tiling array data is extremely challenging due to the astronomically large number of sequence probes, high noise levels of individual probes and limited number of replicates in these data. To overcome these difficulties, we first developed statistical error estimation and weighted ANOVA modeling approaches to high-density tiling array data, especially the former based on an advanced error-pooling method to accurately obtain heterogeneous technical error of small-sample tiling array data. Based on these approaches, we analyzed the high-density tiling array data of the temporal replication patterns during cell-cycle S phase of synchronized HeLa cells on human chromosomes 21 and 22. We found many novel temporal replication patterns, identifying about 26% of over 1 million tiling array sequence probes with significant differential replication during the four 2-h time periods of S phase. Among these differentially replicated probes, 126 941 sequence probes were matched to 417 known genes. The majority of these genes were found to be replicated within one or two consecutive time periods, while the others were replicated at two non-consecutive time periods. Also, coding regions found to be more differentially replicated in particular time periods than noncoding regions in the gene-poor chromosome 21 (25% differentially replicated among genic probes versus 18.6% among intergenic probes), while such a phenomenon was less prominent in gene-rich chromosome 22. A rigorous statistical testing for local proximity of differentially replicated genic and intergenic probes was performed to identify significant stretches of differentially replicated sequence regions. From this analysis, we found that adjacent genes were frequently replicated at different time periods, potentially implying the existence of quite dense replication origins. Evaluating the conditional probability significance of identified gene ontology terms on chromosomes 21 and 22, we detected some over-represented molecular functions and biological processes among these differentially replicated genes, such as the ones relevant to hydrolase, transferase and receptor-binding activities. Some of these results were confirmed showing >70% consistency with cDNA microarray data that were independently generated in parallel with the tiling arrays. Thus, our improved analysis approaches specifically designed for high-density tiling array data enabled us to reliably and sensitively identify many novel temporal replication patterns on human chromosomes.

摘要

由于平铺阵列数据中的序列探针数量极其庞大、单个探针的噪声水平高以及这些数据中的重复样本数量有限,因此对其进行统计分析极具挑战性。为了克服这些困难,我们首先针对高密度平铺阵列数据开发了统计误差估计和加权方差分析建模方法,尤其是前者基于一种先进的误差合并方法,能够准确获取小样本平铺阵列数据的异质技术误差。基于这些方法,我们分析了人类21号和22号染色体上同步化的HeLa细胞在细胞周期S期的时间复制模式的高密度平铺阵列数据。我们发现了许多新的时间复制模式,在S期的四个2小时时间段内,在超过100万个平铺阵列序列探针中,约26%的探针具有显著的差异复制。在这些差异复制的探针中,126941个序列探针与417个已知基因匹配。发现这些基因中的大多数在一个或两个连续时间段内复制,而其他基因在两个不连续时间段内复制。此外,在基因贫乏的21号染色体上,发现编码区在特定时间段内比非编码区更具差异复制(基因探针中有25%差异复制,基因间探针中有18.6%差异复制),而在基因丰富的22号染色体上这种现象不太明显。对差异复制的基因和基因间探针的局部邻近性进行了严格的统计检验,以识别差异复制序列区域的显著延伸。通过该分析,我们发现相邻基因经常在不同时间段内复制,这可能意味着存在相当密集的复制起点。评估21号和22号染色体上已识别的基因本体术语的条件概率显著性时,我们在这些差异复制的基因中检测到一些过度代表的分子功能和生物学过程,例如与水解酶、转移酶和受体结合活性相关的过程。其中一些结果得到了证实,与与平铺阵列并行独立生成的cDNA微阵列数据的一致性超过70%。因此,我们专门为高密度平铺阵列数据设计的改进分析方法使我们能够可靠且灵敏地识别出人类染色体上许多新的时间复制模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a640/1888820/4b004e7092b3/gkm130f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验