• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组注释的过去、现在与未来:如何在每个基因座定义一个开放阅读框。

Genome annotation past, present, and future: how to define an ORF at each locus.

作者信息

Brent Michael R

机构信息

Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis, Missouri 63130, USA.

出版信息

Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105.

DOI:10.1101/gr.3866105
PMID:16339376
Abstract

Driven by competition, automation, and technology, the genomics community has far exceeded its ambition to sequence the human genome by 2005. By analyzing mammalian genomes, we have shed light on the history of our DNA sequence, determined that alternatively spliced RNAs and retroposed pseudogenes are incredibly abundant, and glimpsed the apparently huge number of non-coding RNAs that play significant roles in gene regulation. Ultimately, genome science is likely to provide comprehensive catalogs of these elements. However, the methods we have been using for most of the last 10 years will not yield even one complete open reading frame (ORF) for every gene--the first plateau on the long climb toward a comprehensive catalog. These strategies--sequencing randomly selected cDNA clones, aligning protein sequences identified in other organisms, sequencing more genomes, and manual curation--will have to be supplemented by large-scale amplification and sequencing of specific predicted mRNAs. The steady improvements in gene prediction that have occurred over the last 10 years have increased the efficacy of this approach and decreased its cost. In this Perspective, I review the state of gene prediction roughly 10 years ago, summarize the progress that has been made since, argue that the primary ORF identification methods we have relied on so far are inadequate, and recommend a path toward completing the Catalog of Protein Coding Genes, Version 1.0.

摘要

在竞争、自动化和技术的推动下,基因组学界远远超越了在2005年前完成人类基因组测序的目标。通过分析哺乳动物基因组,我们了解了DNA序列的历史,确定可变剪接RNA和逆转座假基因极其丰富,并瞥见了在基因调控中发挥重要作用的大量非编码RNA。最终,基因组科学可能会提供这些元件的全面目录。然而,在过去十年的大部分时间里我们一直使用的方法甚至无法为每个基因产生一个完整的开放阅读框(ORF)——这是迈向全面目录漫长征程中的第一个平台期。这些策略——对随机选择的cDNA克隆进行测序、比对在其他生物体中鉴定出的蛋白质序列、对更多基因组进行测序以及人工编辑——将不得不通过对特定预测mRNA进行大规模扩增和测序来加以补充。在过去十年中基因预测方面的稳步改进提高了这种方法的效率并降低了成本。在这篇观点文章中,我回顾了大约十年前基因预测的状况,总结了此后取得的进展,认为我们目前所依赖的主要ORF识别方法并不充分,并推荐了一条完成蛋白质编码基因目录1.0版的途径。

相似文献

1
Genome annotation past, present, and future: how to define an ORF at each locus.基因组注释的过去、现在与未来:如何在每个基因座定义一个开放阅读框。
Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105.
2
Comparative genomics as a tool for gene discovery.比较基因组学作为一种基因发现工具。
Curr Opin Biotechnol. 2006 Apr;17(2):161-7. doi: 10.1016/j.copbio.2006.01.007. Epub 2006 Feb 3.
3
Steady progress and recent breakthroughs in the accuracy of automated genome annotation.自动基因组注释准确性方面的稳步进展和近期突破。
Nat Rev Genet. 2008 Jan;9(1):62-73. doi: 10.1038/nrg2220.
4
Computational approaches to gene prediction.基因预测的计算方法。
J Microbiol. 2006 Apr;44(2):137-44.
5
Re-prediction of protein-coding genes in the genome of Amsacta moorei entomopoxvirus.摩尔夜蛾昆虫痘病毒基因组中蛋白质编码基因的重新预测
J Virol Methods. 2007 Dec;146(1-2):389-92. doi: 10.1016/j.jviromet.2007.07.010. Epub 2007 Aug 23.
6
[Development of antituberculous drugs: current status and future prospects].[抗结核药物的研发:现状与未来前景]
Kekkaku. 2006 Dec;81(12):753-74.
7
Cataloging coding sequence variations in human genome databases.对人类基因组数据库中的编码序列变异进行编目。
PLoS One. 2008;3(10):e3575. doi: 10.1371/journal.pone.0003575. Epub 2008 Oct 30.
8
Strategies for whole microbial genome sequencing and analysis.全微生物基因组测序与分析策略。
Electrophoresis. 1997 Aug;18(8):1207-16. doi: 10.1002/elps.1150180803.
9
Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes.人类和小鼠基因组中重复区域内基因和假基因的鉴定与分析。
PLoS Comput Biol. 2006 Jun 30;2(6):e76. doi: 10.1371/journal.pcbi.0020076. Epub 2006 May 16.
10
The use of evolutionary biology concepts for genome annotation.进化生物学概念在基因组注释中的应用。
J Exp Zool B Mol Dev Evol. 2007 Jan 15;308(1):26-36. doi: 10.1002/jez.b.21131.

引用本文的文献

1
Subtractive proteomics and molecular docking identify therapeutic targets and drug candidates in drug resistant Klebsiella Michiganensis THO-011.消减蛋白质组学和分子对接鉴定耐药物密歇根克雷伯菌THO - 011中的治疗靶点和候选药物。
Sci Rep. 2025 Jul 3;15(1):23776. doi: 10.1038/s41598-025-08107-x.
2
In silico discovery of druggable targets in Citrobacter koseri using echinoderm metabolites and molecular dynamics simulation.利用棘皮动物代谢物和分子动力学模拟在克氏柠檬酸杆菌中发现可成药靶标的计算方法。
Sci Rep. 2024 Nov 5;14(1):26776. doi: 10.1038/s41598-024-77342-5.
3
Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content.
统计分析伪随机序列和真实序列中同义密码子和终止密码子与 GC 含量的关系。
Sci Rep. 2023 Dec 27;13(1):22996. doi: 10.1038/s41598-023-49626-9.
4
Machine learning in postgenomic biology and personalized medicine.后基因组生物学与个性化医学中的机器学习
Wiley Interdiscip Rev Data Min Knowl Discov. 2022 Mar-Apr;12(2). doi: 10.1002/widm.1451. Epub 2022 Jan 24.
5
Splice-site identification for exon prediction using bidirectional LSTM-RNN approach.使用双向长短期记忆循环神经网络(LSTM-RNN)方法进行外显子预测的剪接位点识别。
Biochem Biophys Rep. 2022 May 26;30:101285. doi: 10.1016/j.bbrep.2022.101285. eCollection 2022 Jul.
6
In-Depth Annotation of the Reveals the Presence of Several Alternative ORFs That Could Encode for Motif-Rich Peptides.深入注释揭示了存在几个可能编码富含基序肽的替代 ORF。
Cells. 2021 Nov 2;10(11):2983. doi: 10.3390/cells10112983.
7
Pseudogene ACTBP2 increases blood-brain barrier permeability by promoting KHDRBS2 transcription through recruitment of KMT2D/WDR5 in Aβ microenvironment.假基因ACTBP2通过在Aβ微环境中招募KMT2D/WDR5促进KHDRBS2转录,从而增加血脑屏障通透性。
Cell Death Discov. 2021 Jun 14;7(1):142. doi: 10.1038/s41420-021-00531-y.
8
Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing.下一代测序获得序列的计算基因组注释综述
Biology (Basel). 2020 Sep 18;9(9):295. doi: 10.3390/biology9090295.
9
Translation Initiation Site Profiling Reveals Widespread Synthesis of Non-AUG-Initiated Protein Isoforms in Yeast.翻译起始位点分析揭示酵母中广泛存在非AUG起始的蛋白质异构体合成。
Cell Syst. 2020 Aug 26;11(2):145-160.e5. doi: 10.1016/j.cels.2020.06.011. Epub 2020 Jul 24.
10
Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models.全面的基因结构分析:自动预测和手动注释基因模型的比较案例研究。
BMC Genomics. 2019 Oct 17;20(1):753. doi: 10.1186/s12864-019-6064-8.