• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

原核生物基因组中基因识别的概率方法:与隐马尔可夫模型理论的联系。

Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.

作者信息

Azad Rajeev K, Borodovsky Mark

机构信息

School of Biology and School of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0230, USA.

出版信息

Brief Bioinform. 2004 Jun;5(2):118-30. doi: 10.1093/bib/5.2.118.

DOI:10.1093/bib/5.2.118
PMID:15260893
Abstract

In this paper, we review developments in probabilistic methods of gene recognition in prokaryotic genomes with the emphasis on connections to the general theory of hidden Markov models (HMM). We show that the Bayesian method implemented in GeneMark, a frequently used gene-finding tool, can be augmented and reintroduced as a rigorous forward-backward (FB) algorithm for local posterior decoding described in the HMM theory. Another earlier developed method, prokaryotic GeneMark.hmm, uses a modification of the Viterbi algorithm for HMM with duration to identify the most likely global path through hidden functional states given the DNA sequence. GeneMark and GeneMark.hmm programs are worth using in concert for analysing prokaryotic DNA sequences that arguably do not follow any exact mathematical model. The new extension of GeneMark using the FB algorithm was implemented in the software program GeneMark.fba. Given the DNA sequence, this program determines an a posteriori probability for each nucleotide to belong to coding or non-coding region. Also, for any open reading frame (ORF), it assigns a score defined as a probabilistic measure of all paths through hidden states that traverse the ORF as a coding region. The prediction accuracy of GeneMark.fba determined in our tests was compared favourably to the accuracy of the initial (standard) GeneMark program. Comparison to the prokaryotic GeneMark.hmm has also demonstrated a certain, yet species-specific, degree of improvement in raw gene detection, ie detection of correct reading frame (and stop codon). The accuracy of exact gene prediction, which is concerned about precise prediction of gene start (which in a prokaryotic genome unambiguously defines the reading frame and stop codon, thus, the whole protein product), still remains more accurate in GeneMarkS, which uses more elaborate HMM to specifically address this task.

摘要

在本文中,我们回顾了原核生物基因组中基因识别概率方法的发展,重点是与隐马尔可夫模型(HMM)一般理论的联系。我们表明,常用的基因发现工具GeneMark中实现的贝叶斯方法可以扩展并重新引入,作为HMM理论中描述的用于局部后验解码的严格前向-后向(FB)算法。另一种早期开发的方法,原核生物GeneMark.hmm,使用了一种针对具有持续时间的HMM的维特比算法的修改版本,以在给定DNA序列的情况下识别通过隐藏功能状态的最可能全局路径。GeneMark和GeneMark.hmm程序值得协同使用,以分析可能不遵循任何精确数学模型的原核生物DNA序列。使用FB算法的GeneMark新扩展在软件程序GeneMark.fba中实现。给定DNA序列,该程序确定每个核苷酸属于编码或非编码区域的后验概率。此外,对于任何开放阅读框(ORF),它会分配一个分数,该分数定义为通过将ORF作为编码区域遍历的隐藏状态的所有路径的概率度量。在我们的测试中确定的GeneMark.fba的预测准确性与初始(标准)GeneMark程序的准确性相比具有优势。与原核生物GeneMark.hmm的比较也表明,在原始基因检测方面,即正确阅读框(和终止密码子)的检测,有一定程度的、但物种特异性的提高。精确基因预测的准确性,即关注基因起始的精确预测(在原核生物基因组中,基因起始明确地定义了阅读框和终止密码子,从而定义了整个蛋白质产物),在使用更精细的HMM专门解决此任务的GeneMarkS中仍然更准确。

相似文献

1
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.原核生物基因组中基因识别的概率方法:与隐马尔可夫模型理论的联系。
Brief Bioinform. 2004 Jun;5(2):118-30. doi: 10.1093/bib/5.2.118.
2
Prokaryotic gene prediction using GeneMark and GeneMark.hmm.使用GeneMark和GeneMark.hmm进行原核生物基因预测。
Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01.
3
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS:一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。
Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.
4
Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.通过“逐帧”算法寻找原核生物基因:靶向基因起始位点和重叠基因。
Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.
5
How to interpret an anonymous bacterial genome: machine learning approach to gene identification.如何解读匿名细菌基因组:用于基因识别的机器学习方法
Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154.
6
GeneMark.hmm: new solutions for gene finding.基因标记隐马尔可夫模型:基因发现的新解决方案。
Nucleic Acids Res. 1998 Feb 15;26(4):1107-15. doi: 10.1093/nar/26.4.1107.
7
Eukaryotic gene prediction using GeneMark.hmm.使用GeneMark.hmm进行真核基因预测。
Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.6. doi: 10.1002/0471250953.bi0406s01.
8
Gene recognition in cyanobacterium genomic sequence data using the hidden Markov model.利用隐马尔可夫模型在蓝藻基因组序列数据中进行基因识别。
Proc Int Conf Intell Syst Mol Biol. 1996;4:252-60.
9
GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses.基因标记:用于在原核生物、真核生物和病毒中寻找基因的网络软件。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W451-4. doi: 10.1093/nar/gki487.
10
Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.使用GeneMark.hmm-E和GeneMark-ES进行真核基因预测。
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.6.1-4.6.10. doi: 10.1002/0471250953.bi0406s35.

引用本文的文献

1
Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method.利用自适光谱旋转方法预测鞘氨醇蛋白编码区。
PLoS One. 2019 Apr 3;14(4):e0214442. doi: 10.1371/journal.pone.0214442. eCollection 2019.
2
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.搜索拟南芥和其他基因组中的 cds 潜在移码突变。
DNA Res. 2019 Apr 1;26(2):157-170. doi: 10.1093/dnares/dsy046.
3
Investigating genomic structure using changept: A Bayesian segmentation model.使用changept研究基因组结构:一种贝叶斯分割模型。
Comput Struct Biotechnol J. 2014 Aug 27;10(17):107-15. doi: 10.1016/j.csbj.2014.08.003. eCollection 2014 Jul.
4
T-cell epitope vaccine design by immunoinformatics.基于免疫信息学的 T 细胞表位疫苗设计。
Open Biol. 2013 Jan 8;3(1):120139. doi: 10.1098/rsob.120139.
5
Exploration of multivariate analysis in microbial coding sequence modeling.微生物编码序列建模中的多元分析探索。
BMC Bioinformatics. 2012 May 14;13:97. doi: 10.1186/1471-2105-13-97.
6
STITCH: algorithm to splice, trim, identify, track, and capture the uniqueness of 16S rRNAs sequence pairs using public or in-house database.STITCH:一种算法,用于拼接、修剪、识别、跟踪和捕获使用公共或内部数据库的 16S rRNAs 序列对的独特性。
Microb Ecol. 2011 Apr;61(3):669-75. doi: 10.1007/s00248-010-9779-2. Epub 2010 Nov 27.
7
Visualization of the protein-coding regions with a self adaptive spectral rotation approach.采用自适应光谱旋转方法可视化编码蛋白区域。
Nucleic Acids Res. 2011 Jan;39(1):e3. doi: 10.1093/nar/gkq891. Epub 2010 Oct 14.
8
A primer on metagenomics.元基因组学简介。
PLoS Comput Biol. 2010 Feb 26;6(2):e1000667. doi: 10.1371/journal.pcbi.1000667.
9
An ORFome assembly approach to metagenomics sequences analysis.一种用于宏基因组学序列分析的开放阅读框(ORF)集组装方法。
J Bioinform Comput Biol. 2009 Jun;7(3):455-71. doi: 10.1142/s0219720009004151.
10
Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes.利用核苷酸序列的三联体周期性来寻找基因中潜在的阅读框移位。
DNA Res. 2009 Apr;16(2):105-14. doi: 10.1093/dnares/dsp002. Epub 2009 Mar 3.