College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China.
First Affiliated Hospital, Gannan Medical University, Medical College Road, Ganzhou, China.
BMC Bioinformatics. 2024 Mar 12;25(1):108. doi: 10.1186/s12859-024-05727-4.
RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein-protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods.
RNA-蛋白质相互作用(RPI)对各种生物的生命过程至关重要。不同的研究人员已经通过长期和高成本的生物实验来识别 RPI。尽管目前存在许多基于机器学习和深度学习的 RPI 预测方法,但它们的稳健性和通用性仍有很大的改进空间。本研究提出了 LPI-MFF,这是一种基于多源信息融合的 RPI 预测模型,旨在解决这些问题。LPI-MFF 采用蛋白质-蛋白质相互作用特征、序列特征、二级结构特征以及物理化学性质作为信息源,并采用相应的编码方案,然后使用随机森林算法进行特征筛选。最后,将所有信息结合起来,并使用基于卷积神经网络的分类方法。五重交叉验证的实验结果表明,LPI-MFF 在 RPI1807 和 NPInter 上的准确率分别为 97.60%和 97.67%。此外,在独立测试集 RPI1168 上的准确率为 84.9%,在 Mus musculus 数据集上的准确率为 90.91%。因此,LPI-MFF 表现出比其他流行的 RPI 预测方法更强的稳健性和通用性。