College of Basic Medical Sciences, Third Military Medical University, Chongqing 400038, China.
Comput Biol Med. 2010 Jul;40(7):621-8. doi: 10.1016/j.compbiomed.2010.04.006. Epub 2010 May 21.
Hidden Markov models (HMMs) have been extensively used in computational molecular biology, for modelling protein and nucleic acid sequences. The design of the model architecture and the algorithms for parameter estimation and decoding are extremely important for improve the performance of HMM. In topology prediction of transmembrane beta-barrels proteins (TMBs), the Baum-Welch algorithm is widely adapted for HMM training but usually leads to a sub-optimal model in practice. In addition, all the existing HMM-based predictors are only designed to model the transmembrane segment without a submodel to model the signal peptide (SP) for full-length sequences. It is not convenient for users to investigate the structures of full-length TMB sequences.
We present here, an HMM that combine a transmembrane barrel submodel and an SP submodel for both topology and SP predictions. A new genetic algorithm (GA) is presented here to training the model, at the same time the Posterior-Viterbi algorithm is adopted for decoding. A dataset including 33 TMBs that is the most so far in literature are collected for model training and testing. Results of self-consistency and jackknife tests shows the GA has better global performance than the Baum-Welch algorithm. Results of jackknife tests show that this method performs better than all well known existing methods for topology predictions. Furthermore, it provides a function to predict SP in full-length TMBs sequences with fairish accuracy.
We show that our combined HMM-based method is a better choice for TMB topology prediction, which implements topology predictions with higher accuracy and additional SP predictions for full-length TMB sequences.
隐马尔可夫模型(HMM)在计算分子生物学中得到了广泛应用,用于对蛋白质和核酸序列进行建模。模型结构的设计和参数估计与解码算法对于提高 HMM 的性能非常重要。在跨膜β桶蛋白(TMB)的拓扑预测中,Baum-Welch 算法被广泛用于 HMM 训练,但在实践中通常会导致次优模型。此外,所有现有的基于 HMM 的预测器都仅设计用于对跨膜段进行建模,而没有子模型来对全长序列的信号肽(SP)进行建模。这使得用户难以研究全长 TMB 序列的结构。
我们提出了一种 HMM,它结合了跨膜桶子模型和 SP 子模型,用于拓扑和 SP 预测。提出了一种新的遗传算法(GA)来训练模型,同时采用后验维特比算法进行解码。收集了迄今为止文献中最多的 33 个 TMB 数据集进行模型训练和测试。自一致性和交叉验证测试的结果表明,GA 具有比 Baum-Welch 算法更好的全局性能。交叉验证测试的结果表明,该方法在拓扑预测方面优于所有已知的现有方法。此外,它还提供了一个功能,可以对全长 TMB 序列中的 SP 进行预测,具有相当高的准确性。
我们表明,我们的基于组合 HMM 的方法是 TMB 拓扑预测的更好选择,它实现了更高准确性的拓扑预测,并为全长 TMB 序列提供了 SP 预测功能。