Nekrutenko Anton, Makova Kateryna D, Li Wen-Hsiung
Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.
Genome Res. 2002 Jan;12(1):198-202. doi: 10.1101/gr.200901.
Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.
比较基因组学是提高基因预测准确性的一种简单而强大的方法。在本研究中,我们展示了一种利用人类/小鼠序列比对来鉴定蛋白质编码外显子的简单测试方法的实用性。该测试利用了这样一个事实:在绝大多数编码区域中,同义替换(K(S))的发生频率远高于非同义替换(K(A)),并使用K(A)/K(S)比值作为标准。我们展示了以下几点:(1)大多数人类和小鼠外显子足够长,并且具有适合该测试可靠执行的序列差异程度;(2)该测试适用于鉴定长外显子和单外显子基因,而这些基因目前的方法很难预测;(3)该测试的假阴性率低于大多数当前的基因预测方法,假阳性率低于所有当前方法;(4)该测试已经自动化,可以与其他现有的基因预测方法结合使用。