School of Computer Science, Hunan University of Technology, Zhuzhou, China.
College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
Interdiscip Sci. 2022 Mar;14(1):209-232. doi: 10.1007/s12539-021-00483-y. Epub 2022 Jan 10.
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
lncRNA-蛋白质相互作用(LPIs)预测可以加深对许多重要生物过程的理解。人工智能方法已经报道了许多可能的 LPIs。然而,大多数计算技术主要在一个数据集上进行评估,这可能会产生预测偏差。更重要的是,它们仅在 lncRNA-蛋白质对的交叉验证下进行验证,并且没有考虑在 lncRNA 和蛋白质的交叉验证下的性能,因此无法为新的 lncRNA/蛋白质搜索相关的蛋白质/lncRNA。在由自适应 k-最近邻分类器和 Deep 模型组成的集成学习框架(EnANNDeep)下,本研究重点系统地寻找 lncRNA 和蛋白质之间的潜在联系。首先,安排了五个与 LPI 相关的数据集。其次,整合多种源特征来描述 lncRNA-蛋白质对。第三,分别设计自适应 k-最近邻分类器、深度神经网络和深度森林来评分未知的 lncRNA-蛋白质对。最后,基于软投票技术整合来自三个预测器的交互概率。在五倍交叉验证下比较五个经典的 LPI 识别模型(SFPEL、PMDKN、CatBoost、PLIPCOM 和 LPI-SKF)在 lncRNA、蛋白质和 LPIs 上的表现,EnANNDeep 计算出最佳平均 AUCs 分别为 0.8660、0.8775 和 0.9166,最佳平均 AUPRs 分别为 0.8545、0.8595 和 0.9054,表明其具有优越的 LPI 预测能力。案例研究分析表明,SNHG10 可能与 Q15717 有密集的联系。在集成框架中,自适应 k-最近邻分类器可以为每个查询 lncRNA-蛋白质对分别选择最合适的 k。更重要的是,包括深度神经网络和深度森林在内的深度模型可以有效地学习 lncRNA 和蛋白质的代表性特征。