Meyer Irmtraud M, Durbin Richard
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Bioinformatics. 2002 Oct;18(10):1309-18. doi: 10.1093/bioinformatics/18.10.1309.
We present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting. The method employs a probabilistic pair hidden Markov model. We generate annotations using our model with two different algorithms: the Viterbi algorithm in its linear memory implementation and a new heuristic algorithm, called the stepping stone, for which both memory and time requirements scale linearly with the sequence length. We have implemented the model in a computer program called DOUBLESCAN. In this article, we introduce the method and confirm the validity of the approach on a test set of 80 pairs of orthologous DNA sequences from mouse and human. More information can be found at: http://www.sanger.ac.uk/Software/analysis/doublescan/
我们提出了一种用于真核生物基因组中从头预测蛋白质编码基因的全新比较方法。该方法可同时预测两条相互同源的未注释输入DNA序列的基因结构,并检索这两条DNA序列之间保守的子序列。它能够预测部分、完整和多个基因,还能比对因外显子融合或外显子分裂事件而不同的基因对。此方法采用概率性双隐马尔可夫模型。我们使用该模型通过两种不同算法生成注释:线性内存实现的维特比算法,以及一种名为“垫脚石”的新启发式算法,这两种算法的内存和时间需求均与序列长度呈线性关系。我们已在名为DOUBLESCAN的计算机程序中实现了该模型。在本文中,我们介绍了该方法,并在一组由80对来自小鼠和人类的直系同源DNA序列组成的测试集上证实了该方法的有效性。更多信息可在以下网址获取:http://www.sanger.ac.uk/Software/analysis/doublescan/