Gross Samuel S, Do Chuong B, Sirota Marina, Batzoglou Serafim
Computer Science Department, Stanford University, Stanford, CA, USA.
Genome Biol. 2007;8(12):R269. doi: 10.1186/gb-2007-8-12-r269.
We describe CONTRAST, a gene predictor which directly incorporates information from multiple alignments rather than employing phylogenetic models. This is accomplished through the use of discriminative machine learning techniques, including a novel training algorithm. We use a two-stage approach, in which a set of binary classifiers designed to recognize coding region boundaries is combined with a global model of gene structure. CONTRAST predicts exact coding region structures for 65% more human genes than the previous state-of-the-art method, misses 46% fewer exons and displays comparable gains in specificity.
我们介绍了CONTRAST,一种基因预测器,它直接整合来自多序列比对的信息,而不是采用系统发育模型。这是通过使用判别式机器学习技术来实现的,包括一种新颖的训练算法。我们采用两阶段方法,将一组旨在识别编码区边界的二元分类器与基因结构的全局模型相结合。与之前的最先进方法相比,CONTRAST能为多65%的人类基因预测出精确的编码区结构,遗漏的外显子数量减少46%,并且在特异性方面也有类似的提升。