通过解决基于机器学习预测器的输出偏好问题，提高跨膜螺旋蛋白残基真实相对可及表面积的预测能力。

Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors.

机构信息

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China , Shanghai 200240, China.

出版信息

J Chem Inf Model. 2015 Nov 23;55(11):2464-74. doi: 10.1021/acs.jcim.5b00246. Epub 2015 Oct 20.

DOI:10.1021/acs.jcim.5b00246

PMID:26455366

Abstract

The α-helical transmembrane proteins constitute 25% of the entire human proteome space and are difficult targets in high-resolution wet-lab structural studies, calling for accurate computational predictors. We present a novel sequence-based method called MemBrain-Rasa to predict relative solvent accessibility surface area (rASA) from primary sequences. MemBrain-Rasa features by an ensemble prediction protocol composed of a statistical machine-learning engine, which is trained in the sequential feature space, and a segment template similarity-based engine, which is constructed with solved structures and sequence alignment. We locally constructed a comprehensive database of residue relative solvent accessibility surface area from the solved protein 3D structures in the PDB database. It is searched against for segment templates that are expected to be structurally similar to the query sequence's segments. The segment template-based prediction is then fused with the support vector regression outputs using knowledge rules. Our experiments show that pure machine learning output cannot cover the entire rASA solution space and will have a serious prediction preference problem due to the relatively small size of membrane protein structures that can be used as the training samples. The template-based engine solves this problem very well, resulting in significant improvement of the prediction performance. MemBrain-Rasa achieves a Pearson correlation coefficient of 0.733 and mean absolute error of 13.593 on the benchmark dataset, which are 26.4% and 26.1% better than existing predictors. MemBrain-Rasa represents a new progress in structure modeling of α-helical transmembrane proteins. MemBrain-Rasa is available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/.

摘要

α-螺旋跨膜蛋白构成了人类蛋白质组空间的 25%，是高分辨率湿实验室结构研究中的困难目标，因此需要准确的计算预测器。我们提出了一种新的基于序列的方法，称为 MemBrain-Rasa，用于从原始序列预测相对溶剂可及表面积 (rASA)。MemBrain-Rasa 的特点是采用集成预测协议，该协议由统计机器学习引擎组成，该引擎在顺序特征空间中进行训练，以及基于片段模板相似性的引擎，该引擎使用已解决的结构和序列比对构建。我们在本地构建了一个包含 PDB 数据库中已解决蛋白质 3D 结构的残基相对溶剂可及表面积的综合数据库。它用于搜索预期与查询序列片段在结构上相似的片段模板。然后使用知识规则将基于片段的预测与支持向量回归输出融合。我们的实验表明，纯机器学习输出不能涵盖整个 rASA 解决方案空间，并且由于可以用作训练样本的膜蛋白结构相对较小，因此会出现严重的预测偏好问题。基于模板的引擎很好地解决了这个问题，从而显著提高了预测性能。MemBrain-Rasa 在基准数据集上实现了 0.733 的 Pearson 相关系数和 13.593 的平均绝对误差，分别比现有预测器好 26.4%和 26.1%。MemBrain-Rasa 代表了α-螺旋跨膜蛋白结构建模的新进展。MemBrain-Rasa 可在 www.csbio.sjtu.edu.cn/bioinf/MemBrain/ 上获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过解决基于机器学习预测器的输出偏好问题，提高跨膜螺旋蛋白残基真实相对可及表面积的预测能力。

Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors.

机构信息

出版信息

相似文献

引用本文的文献

通过解决基于机器学习预测器的输出偏好问题，提高跨膜螺旋蛋白残基真实相对可及表面积的预测能力。

Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors.

机构信息

出版信息

相似文献

引用本文的文献