HMM-ModE：使用HMMER3进行实现、基准测试和验证。

HMM-ModE: implementation, benchmarking and validation with HMMER3.

作者信息

Sinha Swati, Lynn Andrew Michael

机构信息

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.

出版信息

BMC Res Notes. 2014 Jul 30;7:483. doi: 10.1186/1756-0500-7-483.

DOI:10.1186/1756-0500-7-483

PMID:25073805

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4236727/

Abstract

BACKGROUND

HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation.

RESULTS

The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy.

CONCLUSIONS

The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.

摘要

背景

HMM-ModE是一种计算方法，它使用负训练序列生成特定家族的轮廓隐马尔可夫模型（profile HMM）。该方法使用10折交叉验证来优化判别阈值，并修改轮廓的发射概率，以减少与其他子家族共享的基于常见折叠的信号。该协议依赖于HMMER程序进行HMM轮廓构建和序列数据库搜索。HMMER3的最新版本将数据库搜索速度提高了几个数量级，使得该方法能够在序列注释项目中大规模部署。我们已经重写了现有的脚本，包括解析HMM轮廓和修改发射概率的层面，以使用HMMER3升级HMM-ModE，HMMER3利用其概率推理，具有很高的计算速度。该方法在GPCR数据集上进行了基准测试和测试，是一种准确快速的功能注释方法。

结果

该方法的实现现在与HMMER3一起工作，并与早期版本的HMMER进行了基准测试，结果表明局部-局部比对的效果仅在包含大量不连续匹配状态的轮廓情况下才显著。该方法在一组黄金标准家族上进行了测试，我们报告称与默认的HMM轮廓相比，误报命中数显著减少。当应用于GPCR序列时，结果表明与用于在不同分类层次水平上对该家族进行分类的其他方法相比，分类准确性有所提高。

结论

目前的研究结果表明，新版本的HMM-ModE是一种高度特异性的方法，用于区分折叠（超家族）和功能（家族）特异性信号，这有助于蛋白质序列的功能注释。使用修改后的GPCR序列轮廓HMM为该家族的分类提供了一种简单而高度特异性的方法，即使序列在子家族之间共享共同的物理化学特征，也能够高精度地预测子家族特异性序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9be3/4236727/bbfa539ccf99/1756-0500-7-483-1.jpg

相似文献

HMM-ModE: implementation, benchmarking and validation with HMMER3.

BMC Res Notes. 2014 Jul 30;7:483. doi: 10.1186/1756-0500-7-483.

HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.

BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.

xHMMER3x2: Utilizing HMMER3's speed and HMMER2's sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation.

Biol Direct. 2016 Nov 29;11(1):63. doi: 10.1186/s13062-016-0163-0.

Accelerated Profile HMM Searches.

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

Improving profile HMM discrimination by adapting transition probabilities.

J Mol Biol. 2004 May 7;338(4):847-54. doi: 10.1016/j.jmb.2004.03.023.

Protein classification based on text document classification techniques.

Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.

ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities.

Adv Bioinformatics. 2011;2011:743782. doi: 10.1155/2011/743782. Epub 2011 Mar 29.

Hidden Markov models in computational biology. Applications to protein modeling.

J Mol Biol. 1994 Feb 4;235(5):1501-31. doi: 10.1006/jmbi.1994.1104.

Metagenome and Metatranscriptome Analyses Using Protein Family Profiles.

PLoS Comput Biol. 2016 Jul 11;12(7):e1004991. doi: 10.1371/journal.pcbi.1004991. eCollection 2016 Jul.

HMMs in Protein Fold Classification.

Methods Mol Biol. 2017;1552:13-27. doi: 10.1007/978-1-4939-6753-7_2.

引用本文的文献

Approaches to increase the validity of gene family identification using manual homology search tools.

Genetica. 2023 Dec;151(6):325-338. doi: 10.1007/s10709-023-00196-8. Epub 2023 Oct 10.

Proteome-Wide Detection and Annotation of Receptor Tyrosine Kinases (RTKs): RTK-PRED and the TyReK Database.

Biomolecules. 2023 Feb 1;13(2):270. doi: 10.3390/biom13020270.

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

BMC Bioinformatics. 2020 Oct 19;21(1):466. doi: 10.1186/s12859-020-03794-x.

Systematic Identification and Classification of β-Lactamases Based on Sequence Similarity Criteria: β-Lactamase Annotation.

Evol Bioinform Online. 2018 Sep 10;14:1176934318797351. doi: 10.1177/1176934318797351. eCollection 2018.

Lactobacillus plantarum LP‑Onlly alters the gut flora and attenuates colitis by inducing microbiome alteration in interleukin‑10 knockout mice.

Mol Med Rep. 2017 Nov;16(5):5979-5985. doi: 10.3892/mmr.2017.7351. Epub 2017 Aug 24.

The genome and phenome of the green alga UTEX 3007 reveal adaptive traits for desert acclimatization.

Elife. 2017 Jun 17;6:e25783. doi: 10.7554/eLife.25783.

本文引用的文献

Molecular signatures of G-protein-coupled receptors.

Nature. 2013 Feb 14;494(7436):185-94. doi: 10.1038/nature11896.

The Pfam protein families database.

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Accelerated Profile HMM Searches.

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities.

Adv Bioinformatics. 2011;2011:743782. doi: 10.1155/2011/743782. Epub 2011 Mar 29.

GPCRDB: information system for G protein-coupled receptors.

Nucleic Acids Res. 2011 Jan;39(Database issue):D309-19. doi: 10.1093/nar/gkq1009. Epub 2010 Nov 2.

An improved classification of G-protein-coupled receptors using sequence-derived features.

BMC Bioinformatics. 2010 Aug 9;11:420. doi: 10.1186/1471-2105-11-420.

A new generation of homology search tools based on probabilistic inference.

Genome Inform. 2009 Oct;23(1):205-11.

Classification of amine type G-protein coupled receptors with feature selection.

Protein Pept Lett. 2008;15(8):834-42. doi: 10.2174/092986608785203755.

PRALINETM: a strategy for improved multiple alignment of transmembrane proteins.

Bioinformatics. 2008 Feb 15;24(4):492-7. doi: 10.1093/bioinformatics/btm636. Epub 2008 Jan 2.

Automated protein subfamily identification and classification.

PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HMM-ModE：使用HMMER3进行实现、基准测试和验证。

HMM-ModE: implementation, benchmarking and validation with HMMER3.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献