College of Information Science and Engineering, Hunan Normal University, Changsha, China.
Interdiscip Sci. 2023 Sep;15(3):439-451. doi: 10.1007/s12539-023-00573-z. Epub 2023 Jun 12.
Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human complex diseases. Since traditional laboratory experiments are cost and time-consuming, a great quantity of computer algorithms have been proposed for predicting the relationships between lncRNAs and diseases. However, there are still much room for the improvement. In this paper, we introduce an accurate framework named LDAEXC to infer LncRNA-Disease Associations with deep autoencoder and XGBoost Classifier. LDAEXC utilizes different similarity views of lncRNAs and human diseases to construct features for each data sources. Then, the reduced features are obtained by feeding the constructed feature vectors into a deep autoencoder, and at last an XGBoost classifier is leveraged to calculate the latent lncRNA-disease-associated scores using reduced features. The fivefold cross-validation experiments on four datasets showed that LDAEXC reached AUC scores of 0.9676 ± 0.0043, 0.9449 ± 0.022, 0.9375 ± 0.0331 and 0.9556 ± 0.0134, respectively, significantly higher than other advanced similar computer methods. Extensive experiment results and case studies of two complex diseases (colon and breast cancers) further indicated the practicability and excellent prediction performance of LDAEXC in inferring unknown lncRNA-disease associations. TLDAEXC utilizes disease semantic similarity, lncRNA expression similarity, and Gaussian interaction profile kernel similarity of lncRNAs and diseases for feature construction. The constructed features are fed to a deep autoencoder to extract reduced features, and an XGBoost classifier is used to predict the lncRNA-disease associations based on the reduced features. The fivefold and tenfold cross-validation experiments on a benchmark dataset showed that LDAEXC could achieve AUC scores of 0.9676 and 0.9682, respectively, significantly higher than other state-of-the-art similar methods.
大量科学证据表明,长非编码 RNA(lncRNA)参与了人类复杂疾病和生物生命活动的进展。因此,识别新的潜在疾病相关 lncRNA 有助于许多人类复杂疾病的诊断、预后和治疗。由于传统的实验室实验成本高、耗时,因此已经提出了大量计算机算法来预测 lncRNA 与疾病之间的关系。然而,仍有很大的改进空间。在本文中,我们介绍了一个名为 LDAEXC 的准确框架,该框架使用深度自动编码器和 XGBoost 分类器来推断 lncRNA-疾病关联。LDAEXC 利用 lncRNA 和人类疾病的不同相似视图来为每个数据源构建特征。然后,通过将构建的特征向量输入深度自动编码器来获得减少的特征,最后使用 XGBoost 分类器使用减少的特征来计算潜在的 lncRNA-疾病关联分数。在四个数据集上的五重交叉验证实验表明,LDAEXC 达到了 0.9676 ± 0.0043、0.9449 ± 0.022、0.9375 ± 0.0331 和 0.9556 ± 0.0134 的 AUC 分数,显著高于其他先进的类似计算机方法。对两种复杂疾病(结肠癌和乳腺癌)的广泛实验结果和案例研究进一步表明,LDAEXC 在推断未知 lncRNA-疾病关联方面具有实用性和出色的预测性能。TLDAEXC 利用疾病语义相似性、lncRNA 表达相似性以及 lncRNA 和疾病的高斯互作用分布核相似性进行特征构建。构建的特征被输入深度自动编码器以提取减少的特征,并且基于减少的特征使用 XGBoost 分类器来预测 lncRNA-疾病关联。在基准数据集上的五重和十倍交叉验证实验表明,LDAEXC 可以分别达到 0.9676 和 0.9682 的 AUC 分数,显著高于其他最先进的类似方法。