School of Computer Science, Qufu Normal University, Rizhao 276826, China.
Genes (Basel). 2022 Nov 4;13(11):2032. doi: 10.3390/genes13112032.
Long-non-coding RNA (lncRNA) is a transcription product that exerts its biological functions through a variety of mechanisms. The occurrence and development of a series of human diseases are closely related to abnormal expression levels of lncRNAs. Scientists have developed many computational models to identify the lncRNA-disease associations (LDAs). However, many potential LDAs are still unknown. In this paper, a novel method, namely MSF-UBRW (multiple similarities fusion based on unbalanced bi-random walk), is designed to explore new LDAs. First, two similarities (functional similarity and Gaussian Interaction Profile kernel similarity) of lncRNAs are calculated and fused linearly, also for disease data. Then, the known association matrix is preprocessed. Next, the linear neighbor similarities of lncRNAs and diseases are calculated, respectively. After that, the potential associations are predicted based on unbalanced bi-random walk. The fusion of multiple similarities improves the prediction performance of MSF-UBRW to a large extent. Finally, the prediction ability of the MSF-UBRW algorithm is measured by two statistical methods, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV). The AUCs of 0.9391 in LOOCV and 0.9183 (±0.0054) in 5-fold CV confirmed the reliable prediction ability of the MSF-UBRW method. Case studies of three common diseases also show that the MSF-UBRW method can infer new LDAs effectively.
长链非编码 RNA(lncRNA)是一种通过多种机制发挥其生物学功能的转录产物。一系列人类疾病的发生和发展与 lncRNA 的异常表达水平密切相关。科学家们已经开发了许多计算模型来识别 lncRNA-疾病关联(LDAs)。然而,许多潜在的 LDAs 仍然未知。在本文中,设计了一种新方法,即 MSF-UBRW(基于不平衡双随机游走的多重相似性融合),用于探索新的 LDAs。首先,计算 lncRNA 的两种相似性(功能相似性和高斯相互作用谱核相似性)并线性融合,也用于疾病数据。然后,预处理已知关联矩阵。接下来,分别计算 lncRNA 和疾病的线性邻居相似性。之后,基于不平衡双随机游走预测潜在关联。多种相似性的融合在很大程度上提高了 MSF-UBRW 的预测性能。最后,通过两种统计方法,即留一法交叉验证(LOOCV)和 5 折交叉验证(5 折 CV)来衡量 MSF-UBRW 算法的预测能力。LOOCV 中的 AUC 为 0.9391,5 折 CV 中的 AUC 为 0.9183(±0.0054),这证实了 MSF-UBRW 方法具有可靠的预测能力。对三种常见疾病的案例研究也表明,MSF-UBRW 方法可以有效地推断新的 LDAs。