Wu Jia Qian, Shteynberg David, Arumugam Manimozhiyan, Gibbs Richard A, Brent Michael R
Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
Genome Res. 2004 Apr;14(4):665-71. doi: 10.1101/gr.1959604.
The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially or completely missed by methods based on protein-to-genome mapping. Using primers in exons flanking a single predicted intron, we were able to verify the existence of 59% of these predicted genes. We then attempted to amplify the complete predicted open reading frames of 136 genes that were verified in the single-intron experiment. Spliced sequences were amplified in 46 cases (34%). We conclude that this procedure for elucidating gene structures with native cDNA sequences is cost-effective and will become even more so as it is further optimized.
第三类哺乳动物基因组(大鼠基因组)序列草图的发表表明有必要重新思考基因组注释。新的哺乳动物序列不会得到目前针对人类所投入的那种劳动密集型注释工作。在本文中,我们展示了一种替代方法:基于TWINSCAN的双基因组从头预测进行逆转录-聚合酶链反应(RT-PCR)和直接测序。我们测试了444个经TWINSCAN预测的大鼠基因,这些基因与已知的涉及疾病的人类基因具有显著同源性,但基于蛋白质到基因组映射的方法部分或完全遗漏了它们。使用位于单个预测内含子两侧外显子中的引物,我们能够验证这些预测基因中59%的存在。然后,我们试图扩增在单内含子实验中得到验证的136个基因的完整预测开放阅读框。在46例(34%)中扩增出了剪接序列。我们得出结论,这种用天然cDNA序列阐明基因结构的方法具有成本效益,并且随着进一步优化,其成本效益将更高。