Suppr超能文献

通过结合GOR V和片段数据库挖掘进行共识数据挖掘二级结构预测。

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining.

作者信息

Sen Taner Z, Cheng Haitao, Kloczkowski Andrzej, Jernigan Robert L

机构信息

Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-3020, USA.

出版信息

Protein Sci. 2006 Nov;15(11):2499-506. doi: 10.1110/ps.062125306. Epub 2006 Sep 25.

Abstract

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.

摘要

三级结构预测的主要目标是获得尽可能高精度的蛋白质模型。折叠识别、同源建模和从头预测方法通常将预测的二级结构作为输入,并且所有这些方法都可能从更准确的二级结构预测中显著受益。尽管文献中有许多不同的二级结构预测方法,但它们的交叉验证预测准确率一般<80%。为了提高预测准确率,我们开发了一种名为共识数据挖掘(CDM)的新型混合算法,该算法结合了我们之前的两种成功方法:(1)片段数据库挖掘(FDM),它利用蛋白质数据库结构;(2)GOR V,它基于信息论、贝叶斯统计和多序列比对(MSA)。在CDM中,目标序列被分解为较小的片段,这些片段与从蛋白质数据库中相关序列获得的片段进行比较。对于序列同一性高于某个序列同一性阈值的片段,应用FDM方法进行预测。其余片段由GOR V预测。CDM的结果作为比对片段的最高序列同一性和序列同一性阈值的函数给出。我们观察到50%的值是最佳序列同一性阈值,并且根据具有足够高序列同一性的已知结构片段的可用性,通过Q(3)测量的CDM方法的准确率范围为67.5%至93.2%。随着蛋白质数据库的增长,预计这种共识方法将会改进,因为它将更多地依赖于结构片段。

相似文献

4

引用本文的文献

8
Preparation and topology of the Mediator middle module.中介体中段的制备与拓扑结构。
Nucleic Acids Res. 2010 Jun;38(10):3186-95. doi: 10.1093/nar/gkq029. Epub 2010 Jan 31.

本文引用的文献

3
Electrostatics in computational protein design.计算蛋白质设计中的静电学
Curr Opin Chem Biol. 2005 Dec;9(6):622-6. doi: 10.1016/j.cbpa.2005.10.014. Epub 2005 Oct 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验