Applied Bioinformatics Laboratory, The University of Kansas, Lawrence, KS 66047, USA.
Bioinformatics. 2011 Dec 15;27(24):3379-84. doi: 10.1093/bioinformatics/btr579. Epub 2011 Oct 20.
Protein residue-residue contact prediction can be useful in predicting protein 3D structures. Current algorithms for such a purpose leave room for improvement.
We develop ProC_S3, a set of Random Forest algorithm-based models, for predicting residue-residue contact maps. The models are constructed based on a collection of 1490 non-redundant, high-resolution protein structures using >1280 sequence-based features. A new amino acid residue contact propensity matrix and a new set of seven amino acid groups based on contact preference are developed and used in ProC_S3. ProC_S3 delivers a 3-fold cross-validated accuracy of 26.9% with coverage of 4.7% for top L/5 predictions (L is the number of residues in a protein) of long-range contacts (sequence separation ≥24). Further benchmark tests deliver an accuracy of 29.7% and coverage of 5.6% for an independent set of 329 proteins. In the recently completed Ninth Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP9), ProC_S3 is ranked as No. 1, No. 3, and No. 2 accuracies in the top L/5, L/10 and best 5 predictions of long-range contacts, respectively, among 18 automatic prediction servers.
http://www.abl.ku.edu/proc/proc_s3.html.
Supplementary data are available at Bioinformatics online.
蛋白质残基残基接触预测可用于预测蛋白质 3D 结构。目前用于此目的的算法还有改进的空间。
我们开发了 ProC_S3,这是一组基于随机森林算法的模型,用于预测残基残基接触图。这些模型是基于 1490 个非冗余、高分辨率蛋白质结构和>1280 个基于序列的特征构建的。开发了新的氨基酸残基接触倾向矩阵和基于接触偏好的新的七组氨基酸,并在 ProC_S3 中使用。ProC_S3 在 3 倍交叉验证中的准确率为 26.9%,覆盖率为 4.7%,对于长程接触(序列间隔≥24)的前 L/5 预测(L 是蛋白质中残基的数量)。进一步的基准测试在一组独立的 329 个蛋白质中提供了 29.7%的准确率和 5.6%的覆盖率。在最近完成的第九届蛋白质结构预测技术关键评估(CASP9)的社区广泛实验中,ProC_S3 在长程接触的前 L/5、L/10 和最佳 5 预测的准确率排名中分别位列第一、第三和第二,在 18 个自动预测服务器中。
http://www.abl.ku.edu/proc/proc_s3.html。
补充数据可在 Bioinformatics 在线获得。