Chen Yifan, Fu Xiangzheng, Li Zejun, Peng Li, Zhuo Linlin
College of Information Science and Engineering, Hunan University, Changsha, China.
School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China.
Front Bioeng Biotechnol. 2021 Feb 25;9:647113. doi: 10.3389/fbioe.2021.647113. eCollection 2021.
The long non-coding RNA (lncRNA)-protein interaction plays an important role in the post-transcriptional gene regulation, such as RNA splicing, translation, signaling, and the development of complex diseases. The related research on the prediction of lncRNA-protein interaction relationship is beneficial in the excavation and the discovery of the mechanism of lncRNA function and action occurrence, which are important. Traditional experimental methods for detecting lncRNA-protein interactions are expensive and time-consuming. Therefore, computational methods provide many effective strategies to deal with this problem. In recent years, most computational methods only use the information of the lncRNA-lncRNA or the protein-protein similarity and cannot fully capture all features to identify their interactions. In this paper, we propose a novel computational model for the lncRNA-protein prediction on the basis of machine learning methods. First, a feature method is proposed for representing the information of the network topological properties of lncRNA and protein interactions. The basic composition feature information and evolutionary information based on protein, the lncRNA sequence feature information, and the lncRNA expression profile information are extracted. Finally, the above feature information is fused, and the optimized feature vector is used with the recursive feature elimination algorithm. The optimized feature vectors are input to the support vector machine (SVM) model. Experimental results show that the proposed method has good effectiveness and accuracy in the lncRNA-protein interaction prediction.
长链非编码RNA(lncRNA)与蛋白质的相互作用在转录后基因调控中发挥着重要作用,如RNA剪接、翻译、信号传导以及复杂疾病的发生发展。对lncRNA与蛋白质相互作用关系的预测进行相关研究,有助于挖掘和发现lncRNA功能及作用机制,这具有重要意义。传统的检测lncRNA与蛋白质相互作用的实验方法昂贵且耗时。因此,计算方法为解决这一问题提供了许多有效策略。近年来,大多数计算方法仅利用lncRNA与lncRNA或蛋白质与蛋白质的相似性信息,无法充分捕捉所有特征来识别它们之间的相互作用。在本文中,我们基于机器学习方法提出了一种用于lncRNA与蛋白质预测的新型计算模型。首先,提出了一种特征方法来表示lncRNA与蛋白质相互作用的网络拓扑性质信息。提取基于蛋白质的基本组成特征信息和进化信息、lncRNA序列特征信息以及lncRNA表达谱信息。最后,将上述特征信息进行融合,并将优化后的特征向量与递归特征消除算法一起使用。将优化后的特征向量输入到支持向量机(SVM)模型中。实验结果表明,所提出的方法在lncRNA与蛋白质相互作用预测中具有良好的有效性和准确性。