使用集成方法计算蛋白质 - RNA 复合物中结合能热点。

Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach.

机构信息

School of Software, Central South University, Changsha 410075, China.

School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China.

出版信息

Bioinformatics. 2018 May 1;34(9):1473-1480. doi: 10.1093/bioinformatics/btx822.

DOI:10.1093/bioinformatics/btx822

PMID:29281004

Abstract

MOTIVATION

Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage.

RESULTS

Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result.

AVAILABILITY AND IMPLEMENTATION

The PrabHot webserver is freely available at http://denglab.org/PrabHot/.

CONTACT

leideng@csu.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

确定 RNA 结合残基，特别是能量有利的热点，可以为理解蛋白质-RNA 相互作用的机制和功能重要性提供有价值的线索。然而，蛋白质-RNA 晶体结构中实验识别的能量热点的有限可用性导致开发经验识别方法的困难。RNA 结合热点残基的计算预测仍处于起步阶段。

结果

在这里，我们描述了一种计算方法 PrabHot（蛋白质-RNA 结合热点预测），该方法可以使用概念上不同的机器学习分类器的集合，有效地检测蛋白质-RNA 结合界面上的热点残基。残基相互作用网络特征和新的溶剂暴露特征结合在一起，并使用 Boruta 算法进行分类选择。特别是，我们生成了两个新的参考数据集（基准数据集和独立数据集），其中包含 47 个已知蛋白质-RNA 复合物结构中的 107 个热点。在训练数据集上的 10 倍交叉验证中，PrabHot 取得了有希望的性能，AUC 得分为 0.86，灵敏度为 0.78，明显优于先驱的 RNA 结合热点预测方法 HotSPRing。我们还在独立测试数据集上展示了我们提出的方法的能力，并因此获得了竞争优势。