用隐马尔可夫模型预测跨膜蛋白拓扑结构:应用于完整基因组。

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

作者信息

Krogh A, Larsson B, von Heijne G, Sonnhammer E L

机构信息

Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark.

出版信息

J Mol Biol. 2001 Jan 19;305(3):567-80. doi: 10.1006/jmbi.2000.4315.

Abstract

We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.

摘要

我们描述并验证了一种基于隐马尔可夫模型的新型膜蛋白拓扑结构预测方法——TMHMM。我们对TMHMM的性能进行了详细分析,结果表明它能正确预测97% - 98%的跨膜螺旋。此外,TMHMM区分可溶性蛋白和膜蛋白的特异性和敏感性均优于99%,不过存在信号肽时准确率会下降。这种高度的准确性使我们能够可靠地预测大量基因组中的整合膜蛋白。基于这些预测,我们估计大多数基因组中20% - 30%的基因编码膜蛋白,这与之前的估计一致。我们还进一步发现,除秀丽隐杆线虫外,在所有被研究的生物体中,具有N(in)-C(in)拓扑结构的蛋白都占强烈优势,在秀丽隐杆线虫中,大量的7TM受体增加了N(out)-C(in)拓扑结构的数量。我们讨论了这一发现对于我们理解膜蛋白组装机制可能具有的相关性。可通过http://www.cbs.dtu.dk/services/TMHMM/获得TMHMM预测服务。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索