College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China.
J Comput Aided Mol Des. 2018 Dec;32(12):1363-1373. doi: 10.1007/s10822-018-0177-z. Epub 2018 Nov 26.
Identifying protein-RNA binding residues is essential for understanding the mechanism of protein-RNA interactions. So far, rigid distance thresholds are commonly used to define protein-RNA binding residues. However, after investigating 182 non-redundant protein-RNA complexes, we find that it would be unsuitable for a certain amount of complexes since the distances between proteins and RNAs vary widely. In this work, a novel definition method was proposed based on a flexible distance cutoff. This method can fully consider the individual differences among complexes by setting a variable tolerance limit of protein-RNA interactions, i.e. the double minimum-distance by which different distance thresholds are achieved for different complexes. In order to validate our method, a comprehensive comparison between our flexible method and traditional rigid methods was implemented in terms of interface structure, amino acid composition, interface area and interaction force, etc. The results indicate that this method is more reasonable because it incorporates the specificity of different complexes by extracting the important residues lost by rigid distance methods and discarding some redundant residues. Finally, to further test our double minimum-distance definition strategy, we developed a classifier to predict those binding sites derived from our new method by using structural features and a random forest machine learning algorithm. The model achieved a satisfactory prediction performance and the accuracy on independent data sets reaches to 85.0%. To the best of our knowledge, it is the first prediction model to define positive and negative samples using a flexible cutoff. So the comparison analysis and modeling results have demonstrated that our method would be a very promising strategy for more precisely defining protein-RNA binding sites.
鉴定蛋白质与 RNA 的结合残基对于理解蛋白质与 RNA 的相互作用机制至关重要。到目前为止,刚性距离阈值通常用于定义蛋白质与 RNA 的结合残基。然而,在研究了 182 个非冗余的蛋白质-RNA 复合物后,我们发现对于某些复合物来说,这种方法并不适用,因为蛋白质与 RNA 之间的距离差异很大。在这项工作中,提出了一种基于灵活距离截止值的新定义方法。该方法通过设置蛋白质与 RNA 相互作用的可变容忍限,即不同距离阈值在不同复合物中实现的双最小距离,充分考虑了复合物之间的个体差异。为了验证我们的方法,我们在界面结构、氨基酸组成、界面面积和相互作用力等方面对我们的灵活方法和传统刚性方法进行了全面比较。结果表明,这种方法更合理,因为它通过提取刚性距离方法丢失的重要残基,并丢弃一些冗余残基,纳入了不同复合物的特异性。最后,为了进一步测试我们的双最小距离定义策略,我们使用结构特征和随机森林机器学习算法开发了一个分类器,来预测我们新方法得到的那些结合位点。该模型取得了令人满意的预测性能,在独立数据集上的准确率达到了 85.0%。据我们所知,这是第一个使用灵活截止值来定义阳性和阴性样本的预测模型。因此,比较分析和建模结果表明,我们的方法将是一种非常有前途的策略,可更准确地定义蛋白质与 RNA 的结合位点。