Dewey Colin, Wu Jia Qian, Cawley Simon, Alexandersson Marina, Gibbs Richard, Pachter Lior
Department of Electrical Engineering, University of California-Berkeley, Berkeley, California 94720, USA.
Genome Res. 2004 Apr;14(4):661-4. doi: 10.1101/gr.1939804.
We describe a new method for simultaneously identifying novel homologous genes with identical structure in the human, mouse, and rat genomes by combining pairwise predictions made with the SLAM gene-finding program. Using this method, we found 3698 gene triples in the human, mouse, and rat genomes which are predicted with exactly the same gene structure. We show, both computationally and experimentally, that the introns of these triples are predicted accurately as compared with the introns of other ab initio gene prediction sets. Computationally, we compared the introns of these gene triples, as well as those from other ab initio gene finders, with known intron annotations. We show that a unique property of SLAM, namely that it predicts gene structures simultaneously in two organisms, is key to producing sets of predictions that are highly accurate in intron structure when combined with other programs. Experimentally, we performed reverse transcription-polymerase chain reaction (RT-PCR) in both the human and rat to test the exon pairs flanking introns from a subset of the gene triples for which the human gene had not been previously identified. By performing RT-PCR on orthologous introns in both the human and rat genomes, we additionally explore the validity of using RT-PCR as a method for confirming gene predictions.
我们描述了一种新方法,通过结合使用SLAM基因发现程序进行的成对预测,在人类、小鼠和大鼠基因组中同时识别具有相同结构的新同源基因。使用这种方法,我们在人类、小鼠和大鼠基因组中发现了3698个基因三元组,它们被预测具有完全相同的基因结构。我们通过计算和实验表明,与其他从头开始的基因预测集的内含子相比,这些三元组的内含子被准确预测。在计算方面,我们将这些基因三元组的内含子以及其他从头开始的基因发现程序的内含子与已知的内含子注释进行了比较。我们表明,SLAM的一个独特特性,即它能同时在两种生物体中预测基因结构,是与其他程序结合时产生内含子结构高度准确的预测集的关键。在实验方面,我们在人类和大鼠中都进行了逆转录-聚合酶链反应(RT-PCR),以测试来自一部分基因三元组的内含子两侧的外显子对,其中人类基因此前尚未被识别。通过在人类和大鼠基因组中的直系同源内含子上进行RT-PCR,我们还探索了使用RT-PCR作为确认基因预测方法的有效性。