基因组注释的过去、现在与未来：如何在每个基因座定义一个开放阅读框。

Genome annotation past, present, and future: how to define an ORF at each locus.

作者信息

Brent Michael R

机构信息

Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis, Missouri 63130, USA.

出版信息

Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105.

DOI:10.1101/gr.3866105

PMID:16339376

Abstract

Driven by competition, automation, and technology, the genomics community has far exceeded its ambition to sequence the human genome by 2005. By analyzing mammalian genomes, we have shed light on the history of our DNA sequence, determined that alternatively spliced RNAs and retroposed pseudogenes are incredibly abundant, and glimpsed the apparently huge number of non-coding RNAs that play significant roles in gene regulation. Ultimately, genome science is likely to provide comprehensive catalogs of these elements. However, the methods we have been using for most of the last 10 years will not yield even one complete open reading frame (ORF) for every gene--the first plateau on the long climb toward a comprehensive catalog. These strategies--sequencing randomly selected cDNA clones, aligning protein sequences identified in other organisms, sequencing more genomes, and manual curation--will have to be supplemented by large-scale amplification and sequencing of specific predicted mRNAs. The steady improvements in gene prediction that have occurred over the last 10 years have increased the efficacy of this approach and decreased its cost. In this Perspective, I review the state of gene prediction roughly 10 years ago, summarize the progress that has been made since, argue that the primary ORF identification methods we have relied on so far are inadequate, and recommend a path toward completing the Catalog of Protein Coding Genes, Version 1.0.

摘要

在竞争、自动化和技术的推动下，基因组学界远远超越了在2005年前完成人类基因组测序的目标。通过分析哺乳动物基因组，我们了解了DNA序列的历史，确定可变剪接RNA和逆转座假基因极其丰富，并瞥见了在基因调控中发挥重要作用的大量非编码RNA。最终，基因组科学可能会提供这些元件的全面目录。然而，在过去十年的大部分时间里我们一直使用的方法甚至无法为每个基因产生一个完整的开放阅读框（ORF）——这是迈向全面目录漫长征程中的第一个平台期。这些策略——对随机选择的cDNA克隆进行测序、比对在其他生物体中鉴定出的蛋白质序列、对更多基因组进行测序以及人工编辑——将不得不通过对特定预测mRNA进行大规模扩增和测序来加以补充。在过去十年中基因预测方面的稳步改进提高了这种方法的效率并降低了成本。在这篇观点文章中，我回顾了大约十年前基因预测的状况，总结了此后取得的进展，认为我们目前所依赖的主要ORF识别方法并不充分，并推荐了一条完成蛋白质编码基因目录1.0版的途径。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基因组注释的过去、现在与未来：如何在每个基因座定义一个开放阅读框。

Genome annotation past, present, and future: how to define an ORF at each locus.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

基因组注释的过去、现在与未来：如何在每个基因座定义一个开放阅读框。

Genome annotation past, present, and future: how to define an ORF at each locus.

作者信息

机构信息

出版信息

相似文献

引用本文的文献