Tang Lili, Liu Longlong, Jiang Yan, Yuan Yi
School of Computer Science and Artificial Intelligence, Hunan University of Technology, Zhuzhou, 412007, China.
School of Biological Science and Medical Engineering, Hunan University of Technology, Zhuzhou, 412007, China.
Sci Rep. 2025 Aug 26;15(1):31407. doi: 10.1038/s41598-025-16177-0.
Long noncoding RNAs (lncRNAs) are important regulators and promising targets for complex diseases. They have manifested dense relationships with various diseases. Although laboratory techniques have validated many lncRNA-disease associations (LDAs), they are costly, laborious, and time-consuming. This study introduces LDA-GMCB, an LDA inference model, by leveraging graph embedding learning, multi-head self-attention mechanism (MSA) with convolutional neural network (CNN), low-rank singular value decomposition (SVD), and histogram-based gradient boosting (HGBoost). For all lncRNAs and diseases, LDA-GMCB first deciphers their nonlinear features by incorporating graph embedding learning and MSA with CNN, then captures their linear features through low-rank SVD, and finally infers their relationships based on HGBoost. LDA-GMCB was compared with four baselines (i.e., SDLDA, LDNFSGB, IPCARF and LDA-VGHB) under 5-fold cross validation and two cold start scenarios, and four popular classifiers (i.e., multi-layer perceptron, SVM, random forest, and XGBoost). Additionally, LDA-GMCB implemented ablation study. The outcomes demonstrated that LDA-GMCB greatly surpassed the above models and gained significant improvement on two public databases (i.e., lncRNADisease and MNDR) under most conditions. Moreover, LDA-GMCB was further applied to infer potential lncRNAs for Alzheimer's disease and Parkinson's disease. It identified that DGCR5 and HIF1A could link with the two diseases, respectively. We hope that LDA-GMCB help infer potential lncRNAs for various complex diseases. LDA-GMCB is freely available at https://github.com/smiling199/LDA-GMCB .
长链非编码RNA(lncRNAs)是复杂疾病的重要调节因子和有前景的靶点。它们已显示出与各种疾病的密切关系。尽管实验室技术已验证了许多lncRNA-疾病关联(LDAs),但这些技术成本高、费力且耗时。本研究通过利用图嵌入学习、结合卷积神经网络(CNN)的多头自注意力机制(MSA)、低秩奇异值分解(SVD)和基于直方图的梯度提升(HGBoost),引入了一种LDA推理模型LDA-GMCB。对于所有lncRNAs和疾病,LDA-GMCB首先通过将图嵌入学习和MSA与CNN相结合来解读它们的非线性特征,然后通过低秩SVD捕捉它们的线性特征,最后基于HGBoost推断它们之间的关系。在5折交叉验证和两种冷启动场景下,将LDA-GMCB与四个基线(即SDLDA、LDNFSGB、IPCARF和LDA-VGHB)以及四个流行的分类器(即多层感知器、支持向量机、随机森林和XGBoost)进行了比较。此外,LDA-GMCB进行了消融研究。结果表明,LDA-GMCB在大多数情况下大大超过了上述模型,并在两个公共数据库(即lncRNADisease和MNDR)上取得了显著改进。此外,LDA-GMCB还被进一步应用于推断阿尔茨海默病和帕金森病的潜在lncRNAs。研究发现DGCR5和HIF1A可能分别与这两种疾病相关。我们希望LDA-GMCB有助于推断各种复杂疾病的潜在lncRNAs。LDA-GMCB可在https://github.com/smiling199/LDA-GMCB上免费获取。