Sinha Swati, Lynn Andrew Michael
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
BMC Res Notes. 2014 Jul 30;7:483. doi: 10.1186/1756-0500-7-483.
HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation.
The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy.
The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.
HMM-ModE是一种计算方法,它使用负训练序列生成特定家族的轮廓隐马尔可夫模型(profile HMM)。该方法使用10折交叉验证来优化判别阈值,并修改轮廓的发射概率,以减少与其他子家族共享的基于常见折叠的信号。该协议依赖于HMMER程序进行HMM轮廓构建和序列数据库搜索。HMMER3的最新版本将数据库搜索速度提高了几个数量级,使得该方法能够在序列注释项目中大规模部署。我们已经重写了现有的脚本,包括解析HMM轮廓和修改发射概率的层面,以使用HMMER3升级HMM-ModE,HMMER3利用其概率推理,具有很高的计算速度。该方法在GPCR数据集上进行了基准测试和测试,是一种准确快速的功能注释方法。
该方法的实现现在与HMMER3一起工作,并与早期版本的HMMER进行了基准测试,结果表明局部-局部比对的效果仅在包含大量不连续匹配状态的轮廓情况下才显著。该方法在一组黄金标准家族上进行了测试,我们报告称与默认的HMM轮廓相比,误报命中数显著减少。当应用于GPCR序列时,结果表明与用于在不同分类层次水平上对该家族进行分类的其他方法相比,分类准确性有所提高。
目前的研究结果表明,新版本的HMM-ModE是一种高度特异性的方法,用于区分折叠(超家族)和功能(家族)特异性信号,这有助于蛋白质序列的功能注释。使用修改后的GPCR序列轮廓HMM为该家族的分类提供了一种简单而高度特异性的方法,即使序列在子家族之间共享共同的物理化学特征,也能够高精度地预测子家族特异性序列。