Chew David S H, Leung Ming-Ying, Choi Kwok Pui
Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore.
BMC Bioinformatics. 2007 May 21;8:163. doi: 10.1186/1471-2105-8-163.
Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another. In this paper, we propose the AT excursion method, which is a score-based approach, to quantify local AT abundance in genomic sequences and use the identified high scoring segments for predicting replication origins. This method has the advantages of requiring no preset window size and having rigorous criteria to evaluate statistical significance of high scoring segments.
We have evaluated the AT excursion method by checking its predictions against known replication origins in herpesviruses and comparing its performance with an existing base weighted score method (BWS1). Out of 43 known origins, 39 are predicted by either one or the other method and 26 origins are predicted by both. The excursion method identifies six origins not predicted by BWS1, showing that the AT excursion method is a valuable complement to BWS1. We have also applied the AT excursion method to two other families of double stranded DNA viruses, the poxviruses and iridoviruses, of which very few replication origins are documented in the public domain. The prediction results are made available as supplementary materials at 1. Preliminary investigation shows that the proposed method works well on some larger genomes too.
The AT excursion method will be a useful computational tool for identifying replication origins in a variety of genomic sequences.
复制起点被认为是理解DNA复制所涉及分子机制的重要位点。已经开发了许多计算方法来预测它们在古细菌、细菌和真核生物基因组中的位置。然而,一种为特定类型基因组设计的预测方法可能对另一种基因组效果不佳。在本文中,我们提出了AT偏移方法,这是一种基于分数的方法,用于量化基因组序列中的局部AT丰度,并使用识别出的高分片段来预测复制起点。该方法具有无需预设窗口大小以及具有严格标准来评估高分片段统计显著性的优点。
我们通过将其预测结果与疱疹病毒中已知的复制起点进行比对,并将其性能与现有的碱基加权评分方法(BWS1)进行比较,对AT偏移方法进行了评估。在43个已知起点中,两种方法之一预测出了39个,两种方法都预测出了26个起点。偏移方法识别出了6个BWS1未预测到的起点,表明AT偏移方法是对BWS1的有价值补充。我们还将AT偏移方法应用于另外两个双链DNA病毒家族,痘病毒和虹彩病毒,在公共领域中记录的它们的复制起点非常少。预测结果作为补充材料在1中提供。初步研究表明,所提出的方法在一些较大的基因组上也能很好地工作。
AT偏移方法将成为识别各种基因组序列中复制起点的有用计算工具。