Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
College of Software, Jilin University, Changchun 130012, China.
Genes (Basel). 2021 Oct 24;12(11):1689. doi: 10.3390/genes12111689.
Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA-protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features, hand-designed features and structure features, called LGFC-CNN, to predict lncRNA-protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA-protein interactions. Compared with other state-of-the-art methods on three lncRNA-protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA-protein interactions by combining raw sequence composition features, hand-designed features and structure features.
长链非编码 RNA(lncRNA)在许多关键的生物过程中发挥着关键作用,并通过与蛋白质的相互作用参与复杂的人类疾病。考虑到通过实验方法鉴定 lncRNA-蛋白质相互作用既昂贵又耗时,我们提出了一种新的基于深度学习的方法,该方法结合了原始序列组成特征、人工设计特征和结构特征,称为 LGFC-CNN,用于预测 lncRNA-蛋白质相互作用。利用两种序列预处理方法和 CNN 模块(GloCNN 和 LocCNN)提取原始序列的全局和局部特征。同时,我们通过比较不同 lncRNA 和蛋白质特征组合的预测效果来选择人工设计的特征。此外,我们通过傅里叶变换获取结构特征并统一维度。最后,将这四种类型的特征整合起来,全面预测 lncRNA-蛋白质相互作用。在三个 lncRNA-蛋白质相互作用数据集上,与其他最先进的方法相比,LGFC-CNN 在 RPI21850 上的准确率达到 94.14%,在 RPI7317 上的准确率达到 92.94%,在 RPI1847 上的准确率达到 98.19%。结果表明,我们的 LGFC-CNN 可以通过结合原始序列组成特征、人工设计特征和结构特征,有效地预测 lncRNA-蛋白质相互作用。