Suppr超能文献

一种基于序列片段的最大熵马尔可夫方法用于蛋白质二级结构预测。

A seqlet-based maximum entropy Markov approach for protein secondary structure prediction.

作者信息

Dong Qiwen, Wang Xiaolong, Lin Lei, Guan Yi

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, China.

出版信息

Sci China C Life Sci. 2005 Aug;48(4):394-405. doi: 10.1360/062004-53.

Abstract

A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.

摘要

提出了一种从氨基酸序列预测蛋白质二级结构的新方法。提取了类似于自然语言中单词的蛋白质二级结构序列片段。这些序列片段将捕捉氨基酸序列与蛋白质二级结构之间的关系,并进一步形成蛋白质二级结构词典。具体来说,该词典是特定生物体的。蛋白质二级结构预测被表述为一个综合的分词和词性标注问题。词格用于表示分词结果,最大熵模型用于计算被标记为某种二级结构类型的序列片段的概率。该方法在序列片段中具有马尔可夫性,允许通过维特比算法对所有可能的分词及其标签的后验概率分布进行高效精确计算。最优分词及其标签作为蛋白质二级结构预测的结果被计算出来。该方法分别应用于预测四种生物体蛋白质的二级结构,并与PHD方法进行比较。结果表明,该方法的性能比PHD方法高约3.9%的Q3准确率和4.6%的SOV准确率。结合通过BLAST获得的局部相似蛋白质序列可以得到更好的预测。该方法还在50个CASP5目标蛋白质上进行了测试,Q3准确率为78.9%,SOV准确率为77.1%。构建了一个蛋白质二级结构预测的网络服务器,可在http://www.insun.hit.edu.cn:81/demos/biology/index.html上获取。

相似文献

1
A seqlet-based maximum entropy Markov approach for protein secondary structure prediction.
Sci China C Life Sci. 2005 Aug;48(4):394-405. doi: 10.1360/062004-53.
2
Bayesian segmentation of protein secondary structure.
J Comput Biol. 2000 Feb-Apr;7(1-2):233-48. doi: 10.1089/10665270050081496.
3
Improving protein secondary structure prediction based on short subsequences with local structure similarity.
BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-11-S4-S4.
4
Protein secondary structure: entropy, correlations and prediction.
Bioinformatics. 2004 Jul 10;20(10):1603-11. doi: 10.1093/bioinformatics/bth132. Epub 2004 Feb 26.
5
Prediction of protein structural classes for low-homology sequences based on predicted secondary structure.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-11-S1-S9.
6
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice.
Bull Math Biol. 2019 Mar;81(3):899-918. doi: 10.1007/s11538-018-00542-4. Epub 2018 Dec 10.
7
[Protein secondary structure prediction based on maximum entropy model].
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2008 Apr;25(2):259-63.
8
Combined prediction of transmembrane topology and signal peptide of beta-barrel proteins: using a hidden Markov model and genetic algorithms.
Comput Biol Med. 2010 Jul;40(7):621-8. doi: 10.1016/j.compbiomed.2010.04.006. Epub 2010 May 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验