Tang Lili, Huang Liangliang, Yuan Yi
School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, China.
School of Information Technology and Administration, Hunan University of Finance and Economics, Changsha, 410125, China.
Sci Rep. 2025 May 31;15(1):19178. doi: 10.1038/s41598-025-03269-0.
lncRNAs are densely related to many human diseases. Identifying new lncRNA-disease associations (LDAs) conduces to better deciphering mechanisms of diseases, finding new biomarkers, and further promoting their diagnosis and treatment. In this manuscript, we devise an LDA prediction framework called LDA-GARB. LDA-GARB first combines nonnegative matrix factorization to extract linear features of lncRNAs and diseases. Next, it computes lncRNA similarity and disease similarity and adopts a graph autoencoder to extract nonlinear features of lncRNAs and diseases. Subsequently, the extracted features are concatenated as a vector. Finally, it takes the obtained vector as inputs and designs a noise-robust gradient boosting model to uncover potential associations from unknown lncRNA-disease pairs. To investigate the LDA-GARB performance, we used precision, recall, accuracy, F1-score, AUC, and AUPR as measurement metrics and performed multiple comparison experiments. First, it was benchmarked with four representative LDA prediction methods, i.e., SDLDA, LDNFSGB, LDAenDL, and LDA-VGHB, under 5-fold cross validations on lncRNAs, diseases, and lncRNA-disease pairs. Next, it was compared with four representative boosting models, i.e., XGBoost, AdaBoost, CatBoost, and LightGBM, under the above three different cross validations. Subsequently, the performance of LDA-GARB against LDA-LNSUBRW, GAMCLDA, LDA-VGHB, LDAGM, and GANLDA on imbalanced data was evaluated. We also performed parameter sensitivity analysis and ablation experiments. The results demonstrated that LDA-GARB improved LDA prediction. Finally, LDA-GARB was applied to predict potential associated lncRNAs for colorectal cancer and breast cancer. CCDC26 and HAR1A have been inferred to have an association with the two cancers, respectively. As a useful LDA identification tool, LDA-GARB is freely available at https://github.com/smiling199/LDA-GARB .
长链非编码RNA(lncRNAs)与许多人类疾病密切相关。识别新的lncRNA-疾病关联(LDA)有助于更好地解读疾病机制、发现新的生物标志物,并进一步推动疾病的诊断和治疗。在本论文中,我们设计了一个名为LDA-GARB的LDA预测框架。LDA-GARB首先结合非负矩阵分解来提取lncRNAs和疾病的线性特征。接下来,它计算lncRNA相似性和疾病相似性,并采用图自动编码器来提取lncRNAs和疾病的非线性特征。随后,将提取的特征连接成一个向量。最后,它将获得的向量作为输入,并设计一个抗噪声梯度提升模型,从未知的lncRNA-疾病对中发现潜在关联。为了研究LDA-GARB的性能,我们使用精确率、召回率、准确率、F1分数、AUC和AUPR作为评估指标,并进行了多次比较实验。首先,在lncRNAs、疾病和lncRNA-疾病对的5折交叉验证下,将其与四种代表性的LDA预测方法(即SDLDA、LDNFSGB、LDAenDL和LDA-VGHB)进行基准测试。接下来,在上述三种不同的交叉验证下,将其与四种代表性的提升模型(即XGBoost、AdaBoost、CatBoost和LightGBM)进行比较。随后,评估了LDA-GARB在不平衡数据上相对于LDA-LNSUBRW、GAMCLDA、LDA-VGHB、LDAGM和GANLDA的性能。我们还进行了参数敏感性分析和消融实验。结果表明,LDA-GARB改进了LDA预测。最后,将LDA-GARB应用于预测结直肠癌和乳腺癌的潜在相关lncRNAs。已推断CCDC26和HAR1A分别与这两种癌症有关联。作为一种有用的LDA识别工具,LDA-GARB可在https://github.com/smiling199/LDA-GARB上免费获取。