• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。

HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.

作者信息

Srivastava Prashant K, Desai Dhwani K, Nandi Soumyadeep, Lynn Andrew M

机构信息

School of Information Technology, Jawaharlal Nehru University, New Delhi, India.

出版信息

BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.

DOI:10.1186/1471-2105-8-104
PMID:17389042
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1852395/
Abstract

BACKGROUND

Profile Hidden Markov Models (HMM) are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model.

RESULTS

The protocol, called HMM-ModE, was validated on a set of sequences belonging to six sub-families of the AGC family of kinases. These sequences have an average sequence similarity of 63% among the group though each sub-group has a different substrate specificity. The optimisation of discrimination threshold, by using negative sequences scored against the model improves specificity in test cases from an average of 21% to 98%. Further discrimination by the HMM after modifying model probabilities using negative training sequences is provided in a few cases, the average specificity rising to 99%. Similar improvements were obtained with a sample of G-Protein coupled receptors sub-classified with respect to their substrate specificity, though the average sequence identity across the sub-families is just 20.6%. The protocol is applied in a high-throughput classification exercise on protein kinases.

CONCLUSION

The protocol has the potential to maximise the contributions of discriminating residues to classify proteins based on their molecular function, using pre-classified positive and negative sequence training data. The high specificity of the method, and increasing availability of pre-classified sequence data holds the potential for its application in sequence annotation.

摘要

背景

轮廓隐马尔可夫模型(HMM)是基于多序列比对中的序列保守模式得到的蛋白质家族的统计表示,已成功用于识别远源同源物。这些保守模式源于多个家族共有的折叠特异性信号以及各家族特有的功能特异性信号。根据功能预先分类的序列的可用性允许使用负训练序列来提高HMM的特异性,方法是优化阈值截止值以及修改发射概率以最小化折叠特异性信号的影响。本文描述了一种生成家族特异性HMM的方案,该方案首先从家族序列的比对构建一个轮廓HMM,然后使用该模型识别得分高于默认阈值(假阳性)的属于其他类别的序列。采用十折交叉验证来优化模型的判别阈值分数。快速多序列比对方法的出现使得能够使用轮廓比对来比对真阳性和假阳性序列,并且所得比对用于修改原始模型中的发射概率。

结果

该方案称为HMM-ModE,在一组属于AGC激酶家族六个亚家族的序列上进行了验证。这些序列在组内平均序列相似性为63%,尽管每个亚组具有不同的底物特异性。通过使用针对模型评分的负序列来优化判别阈值,可将测试案例中的特异性从平均21%提高到98%。在少数情况下,使用负训练序列修改模型概率后,HMM进一步进行判别,平均特异性提高到99%。对于根据底物特异性进行亚分类的G蛋白偶联受体样本也获得了类似的改进,尽管亚家族之间的平均序列同一性仅为20.6%。该方案应用于蛋白质激酶的高通量分类实验。

结论

该方案有潜力利用预先分类的正序列和负序列训练数据,最大化区分性残基对基于分子功能对蛋白质进行分类的贡献。该方法的高特异性以及预先分类的序列数据可用性的增加,使其有潜力应用于序列注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/1010abf9cc76/1471-2105-8-104-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/eb7207a88f8b/1471-2105-8-104-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/8bfebc130308/1471-2105-8-104-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/35ae510e51f8/1471-2105-8-104-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/64247e5ff570/1471-2105-8-104-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/23a8814204c7/1471-2105-8-104-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/c4a99230a78c/1471-2105-8-104-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/1010abf9cc76/1471-2105-8-104-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/eb7207a88f8b/1471-2105-8-104-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/8bfebc130308/1471-2105-8-104-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/35ae510e51f8/1471-2105-8-104-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/64247e5ff570/1471-2105-8-104-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/23a8814204c7/1471-2105-8-104-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/c4a99230a78c/1471-2105-8-104-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384f/1852395/1010abf9cc76/1471-2105-8-104-7.jpg

相似文献

1
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。
BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.
2
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
3
Designing patterns for profile HMM search.设计用于隐马尔可夫模型轮廓搜索的模式。
Bioinformatics. 2007 Jan 15;23(2):e36-43. doi: 10.1093/bioinformatics/btl323.
4
Improving profile HMM discrimination by adapting transition probabilities.通过调整转移概率来提高轮廓隐马尔可夫模型的辨别能力。
J Mol Biol. 2004 May 7;338(4):847-54. doi: 10.1016/j.jmb.2004.03.023.
5
Hidden Markov models in computational biology. Applications to protein modeling.计算生物学中的隐马尔可夫模型。在蛋白质建模中的应用。
J Mol Biol. 1994 Feb 4;235(5):1501-31. doi: 10.1006/jmbi.1994.1104.
6
HMM-Kalign: a tool for generating sub-optimal HMM alignments.HMM-Kalign:一种用于生成次优隐马尔可夫模型比对的工具。
Bioinformatics. 2007 Nov 15;23(22):3095-7. doi: 10.1093/bioinformatics/btm492. Epub 2007 Oct 6.
7
Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER.通过评估SAM和HMMER中的关键算法特征提高轮廓隐马尔可夫模型性能。
BMC Bioinformatics. 2005 Apr 15;6:99. doi: 10.1186/1471-2105-6-99.
8
On single and multiple models of protein families for the detection of remote sequence relationships.用于检测远缘序列关系的蛋白质家族单模型和多模型研究
BMC Bioinformatics. 2006 Jan 31;7:48. doi: 10.1186/1471-2105-7-48.
9
Analysis and prediction of functional sub-types from protein sequence alignments.基于蛋白质序列比对的功能亚类型分析与预测。
J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036.
10
Modelling interaction sites in protein domains with interaction profile hidden Markov models.使用相互作用谱隐马尔可夫模型对蛋白质结构域中的相互作用位点进行建模。
Bioinformatics. 2006 Dec 1;22(23):2851-7. doi: 10.1093/bioinformatics/btl486. Epub 2006 Sep 25.

引用本文的文献

1
Smart Corrosion Monitoring in AA2055 Using Hidden Markov Models and Electrochemical Noise Signal Processing.利用隐马尔可夫模型和电化学噪声信号处理对AA2055进行智能腐蚀监测
Materials (Basel). 2025 Jun 17;18(12):2865. doi: 10.3390/ma18122865.
2
Reconstructing the last common ancestor of all eukaryotes.重建所有真核生物的最后共同祖先。
PLoS Biol. 2024 Nov 25;22(11):e3002917. doi: 10.1371/journal.pbio.3002917. eCollection 2024 Nov.
3
Evolutionary history of calcium-sensing receptors unveils hyper/hypocalcemia-causing mutations.

本文引用的文献

1
Functional classification using phylogenomic inference.使用系统发育基因组推断进行功能分类。
PLoS Comput Biol. 2006 Jun 30;2(6):e77. doi: 10.1371/journal.pcbi.0020077.
2
Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER.通过评估SAM和HMMER中的关键算法特征提高轮廓隐马尔可夫模型性能。
BMC Bioinformatics. 2005 Apr 15;6:99. doi: 10.1186/1471-2105-6-99.
3
A comprehensive update of the sequence and structure classification of kinases.激酶序列与结构分类的全面更新。
钙敏感受体的进化历史揭示了引起高/低钙血症的突变。
PLoS Comput Biol. 2024 Nov 12;20(11):e1012591. doi: 10.1371/journal.pcbi.1012591. eCollection 2024 Nov.
4
Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.基于同源性和非同源性的计算方法在孤儿酶的鉴定和注释中的应用:以结核分枝杆菌 H37Rv 为例。
BMC Bioinformatics. 2020 Oct 19;21(1):466. doi: 10.1186/s12859-020-03794-x.
5
Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723.基于基因组学的发现,一个生物合成基因簇对于真菌 Phoma sp. F3723 合成 BII-Rafflesfungin 是必需的。
BMC Genomics. 2019 May 14;20(1):374. doi: 10.1186/s12864-019-5762-6.
6
HMM-ModE: implementation, benchmarking and validation with HMMER3.HMM-ModE:使用HMMER3进行实现、基准测试和验证。
BMC Res Notes. 2014 Jul 30;7:483. doi: 10.1186/1756-0500-7-483.
7
Factors influencing the diversity of iron uptake systems in aquatic microorganisms.影响水生微生物铁摄取系统多样性的因素。
Front Microbiol. 2012 Oct 18;3:362. doi: 10.3389/fmicb.2012.00362. eCollection 2012.
8
Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum.将蛋白质结构域的隐马尔可夫模型拟合到目标物种上:在疟原虫中的应用。
BMC Bioinformatics. 2012 May 1;13:67. doi: 10.1186/1471-2105-13-67.
9
ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities.ModEnzA:使用具有优化判别阈值和修正发射概率的功能特异性隐马尔可夫模型精确识别代谢酶。
Adv Bioinformatics. 2011;2011:743782. doi: 10.1155/2011/743782. Epub 2011 Mar 29.
10
Employing information theoretic measures and mutagenesis to identify residues critical for drug-proton antiport function in Mdr1p of Candida albicans.运用信息论方法和诱变技术鉴定白念珠菌 Mdr1p 中与药物-质子反向转运功能相关的关键残基。
PLoS One. 2010 Jun 10;5(6):e11041. doi: 10.1371/journal.pone.0011041.
BMC Struct Biol. 2005 Mar 16;5:6. doi: 10.1186/1472-6807-5-6.
4
Subfamily hmms in functional genomics.功能基因组学中的亚家族隐马尔可夫模型
Pac Symp Biocomput. 2005:322-33.
5
Protein homology detection by HMM-HMM comparison.通过隐马尔可夫模型(HMM)比较进行蛋白质同源性检测。
Bioinformatics. 2005 Apr 1;21(7):951-60. doi: 10.1093/bioinformatics/bti125. Epub 2004 Nov 5.
6
Improving profile HMM discrimination by adapting transition probabilities.通过调整转移概率来提高轮廓隐马尔可夫模型的辨别能力。
J Mol Biol. 2004 May 7;338(4):847-54. doi: 10.1016/j.jmb.2004.03.023.
7
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
8
COACH: profile-profile alignment of protein families using hidden Markov models.COACH:使用隐马尔可夫模型对蛋白质家族进行轮廓-轮廓比对。
Bioinformatics. 2004 May 22;20(8):1309-18. doi: 10.1093/bioinformatics/bth091. Epub 2004 Feb 12.
9
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.
10
Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases.决定原核生物和真核生物蛋白激酶中酶-底物特异性的氨基酸。
Proc Natl Acad Sci U S A. 2003 Apr 15;100(8):4463-8. doi: 10.1073/pnas.0737647100. Epub 2003 Apr 4.