• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因结构预测程序的评估。

Evaluation of gene structure prediction programs.

作者信息

Burset M, Guigó R

机构信息

Departament d'Informàtica Mèdica, Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, E-08003, Spain.

出版信息

Genomics. 1996 Jun 15;34(3):353-67. doi: 10.1006/geno.1996.0298.

DOI:10.1006/geno.1996.0298
PMID:8786136
Abstract

We evaluate a number of computer programs designed to predict the structure of protein coding genes in genomic DNA sequences. Computational gene identification is set to play an increasingly important role in the development of the genome projects, as emphasis turns from mapping to large-scale sequencing. The evaluation presented here serves both to assess the current status of the problem and to identify the most promising approaches to ensure further progress. The programs analyzed were uniformly tested on a large set of vertebrate sequences with simple gene structure, and several measures of predictive accuracy were computed at the nucleotide, exon, and protein product levels. The results indicated that the predictive accuracy of the programs analyzed was lower than originally found. The accuracy was even lower when considering only those sequences that had recently been entered and that did not show any similarity to previously entered sequences. This indicates that the programs are overly dependent on the particularities of the examples they learn from. For most of the programs, accuracy in this test set ranged from 0.60 to 0.70 as measured by the Correlation Coefficient (where 1.0 corresponds to a perfect prediction and 0.0 is the value expected for a random prediction), and the average percentage of exons exactly identified was less than 50%. Only those programs including protein sequence database searches showed substantially greater accuracy. The accuracy of the programs was severely affected by relatively high rates of sequence errors. Since the set on which the programs were tested included only relatively short sequences with simple gene structure, the accuracy of the programs is likely to be even lower when used for large uncharacterized genomic sequences with complex structure. While in such cases, programs currently available may still be of great use in pinpointing the regions likely to contain exons, they are far from being powerful enough to elucidate its genomic structure completely.

摘要

我们评估了一些旨在预测基因组DNA序列中蛋白质编码基因结构的计算机程序。随着重点从图谱绘制转向大规模测序,计算基因识别在基因组计划的发展中注定要发挥越来越重要的作用。这里给出的评估既用于评估该问题的当前状态,也用于确定最有前景的方法以确保取得进一步进展。所分析的程序在一大组具有简单基因结构的脊椎动物序列上进行了统一测试,并在核苷酸、外显子和蛋白质产物水平上计算了几种预测准确性的指标。结果表明,所分析程序的预测准确性低于最初发现的水平。当仅考虑那些最近输入且与先前输入序列没有任何相似性的序列时,准确性甚至更低。这表明这些程序过度依赖于它们所学习的示例的特殊性。对于大多数程序,在此测试集中,通过相关系数衡量的准确性范围为0.60至0.70(其中1.0对应于完美预测,0.0是随机预测预期的值),准确识别的外显子的平均百分比不到50%。只有那些包括蛋白质序列数据库搜索的程序显示出显著更高的准确性。程序的准确性受到相对较高的序列错误率的严重影响。由于测试程序所使用的序列集仅包括具有简单基因结构的相对短的序列,当用于具有复杂结构的大型未表征基因组序列时,程序的准确性可能会更低。虽然在这种情况下,当前可用的程序在确定可能包含外显子的区域方面可能仍然非常有用,但它们远不足以完全阐明其基因组结构。

相似文献

1
Evaluation of gene structure prediction programs.基因结构预测程序的评估。
Genomics. 1996 Jun 15;34(3):353-67. doi: 10.1006/geno.1996.0298.
2
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.用于分析人类和模式生物基因组序列的基因查找计算机工具。
Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302.
3
Computational gene identification: an open problem.计算基因识别:一个开放性问题。
Comput Chem. 1997;21(4):215-22. doi: 10.1016/s0097-8485(97)00008-9.
4
Finding genes in DNA with a Hidden Markov Model.使用隐马尔可夫模型在DNA中寻找基因。
J Comput Biol. 1997 Summer;4(2):127-41. doi: 10.1089/cmb.1997.4.127.
5
Prediction of complete gene structures in human genomic DNA.人类基因组DNA中完整基因结构的预测。
J Mol Biol. 1997 Apr 25;268(1):78-94. doi: 10.1006/jmbi.1997.0951.
6
An assessment of gene prediction accuracy in large DNA sequences.大型DNA序列中基因预测准确性的评估。
Genome Res. 2000 Oct;10(10):1631-42. doi: 10.1101/gr.122800.
7
Ab initio gene finding in Drosophila genomic DNA.在果蝇基因组DNA中进行从头基因预测。
Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.
8
Identification of protein coding regions in genomic DNA.基因组DNA中蛋白质编码区域的鉴定。
J Mol Biol. 1995 Apr 21;248(1):1-18. doi: 10.1006/jmbi.1995.0198.
9
Identification of human gene structure using linear discriminant functions and dynamic programming.使用线性判别函数和动态规划识别人类基因结构。
Proc Int Conf Intell Syst Mol Biol. 1995;3:367-75.
10

引用本文的文献

1
Tiberius: end-to-end deep learning with an HMM for gene prediction.提比略:使用隐马尔可夫模型进行基因预测的端到端深度学习。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae685.
2
Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD.TranD 中实现的用于量化可变剪接的核苷酸水平距离度量。
Nucleic Acids Res. 2024 Mar 21;52(5):e28. doi: 10.1093/nar/gkae056.
3
Fine-mapping and evolutionary history of R-BPMV, a dominant resistance gene to Bean pod mottle virus in Phaseolus vulgaris L.
菜豆中对菜豆荚斑驳病毒具有显性抗性的基因R-BPMV的精细定位及进化史
Theor Appl Genet. 2023 Dec 13;137(1):8. doi: 10.1007/s00122-023-04513-9.
4
Genome annotation: From human genetics to biodiversity genomics.基因组注释:从人类遗传学到生物多样性基因组学
Cell Genom. 2023 Aug 1;3(8):100375. doi: 10.1016/j.xgen.2023.100375. eCollection 2023 Aug 9.
5
transcriptome assembly from a compendium of RNA-seq data sets.基于 RNA-seq 数据集文库的转录组组装。
RNA Biol. 2023 Jan;20(1):77-84. doi: 10.1080/15476286.2023.2189331.
6
Addressing the pervasive scarcity of structural annotation in eukaryotic algae.解决真核藻类中普遍存在的结构注释稀缺问题。
Sci Rep. 2023 Jan 30;13(1):1687. doi: 10.1038/s41598-023-27881-0.
7
Whole-Genome-Based Web Genomic Resource for Water Buffalo ().基于全基因组的水牛网络基因组资源()。
Front Genet. 2022 Apr 11;13:809741. doi: 10.3389/fgene.2022.809741. eCollection 2022.
8
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of .用于转录组图谱鉴定的PacBio和牛津纳米孔测序技术的比较分析
Life (Basel). 2021 Aug 23;11(8):862. doi: 10.3390/life11080862.
9
InsectOR-Webserver for sensitive identification of insect olfactory receptor genes from non-model genomes.用于从非模式基因组中灵敏鉴定昆虫嗅觉受体基因的InsectOR网络服务器。
PLoS One. 2021 Jan 19;16(1):e0245324. doi: 10.1371/journal.pone.0245324. eCollection 2021.
10
Arbuscular Mycorrhizal Symbiosis Primes Tolerance to Cucumber Mosaic Virus in Tomato.丛枝菌根共生体使番茄对黄瓜花叶病毒产生耐受性。
Viruses. 2020 Jun 22;12(6):675. doi: 10.3390/v12060675.