College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China.
Product R&D and Testing Center, Shilin Xingdian Agricultural Products Development Co., Ltd., Kunming, Yunnan 652200, China.
Comput Intell Neurosci. 2022 Oct 3;2022:8626628. doi: 10.1155/2022/8626628. eCollection 2022.
Understanding the protein-RNA interaction mechanism can help us to further explore various biological processes. The experimental techniques still have some limitations, such as the high cost of economy and time. Predicting protein-RNA-binding sites by using computational methods is an excellent research tool. Here, we developed a universal method for predicting protein-specific RNA-binding sites, so one general model for a given protein was constructed on a fixed dataset by fusing the data of different experimental techniques. At the same time, information theory was employed to characterize the sequence conservation of RNA-binding segments. Conversation difference profiles between binding and nonbinding segments were constructed by information entropy (IE), which indicates a significant difference. Finally, the 19 proteins-specific models based on random forest (RF) were built based on IE encoding. The performance on the independent datasets demonstrates that our method can obtain competitive results when compared with the current best prediction model.
理解蛋白质-RNA 相互作用的机制可以帮助我们进一步探索各种生物过程。实验技术仍然存在一些限制,例如经济和时间成本高。使用计算方法预测蛋白质-RNA 结合位点是一种极好的研究工具。在这里,我们开发了一种通用的方法来预测蛋白质特异性 RNA 结合位点,因此通过融合不同实验技术的数据,为给定的蛋白质构建了一个通用模型。同时,信息论被用来描述 RNA 结合片段的序列保守性。通过信息熵(IE)构建了结合和非结合片段之间的转换差异分布,表明存在显著差异。最后,基于 IE 编码构建了 19 个基于随机森林(RF)的蛋白质特异性模型。在独立数据集上的性能表明,与当前最佳预测模型相比,我们的方法可以获得有竞争力的结果。