Li Shijun, Chang Miaomiao, Tong Ling, Wang Yuehua, Wang Meng, Wang Fang
Department of Pathology, Chifeng Municipal Hospital, Chifeng, China.
Front Genet. 2023 Jan 20;13:1023615. doi: 10.3389/fgene.2022.1023615. eCollection 2022.
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
乳腺癌和结直肠癌是全球最常见的两种恶性肿瘤。它们是癌症死亡的主要原因。许多研究表明,长链非编码RNA(lncRNAs)与这两种癌症的发生发展密切相关。因此,设计一种有效的方法来识别它们潜在的lncRNA生物标志物至关重要。在本研究中,我们通过整合带重启的随机游走和逻辑矩阵分解开发了一种计算方法(LDA - RWLMF),以研究lncRNA生物标志物在这两种癌症的预后和诊断中的作用。我们首先融合疾病语义和高斯关联谱相似性以及lncRNA功能和高斯关联谱相似性。其次,我们设计了一种负选择算法,基于随机游走提取负向的长链非编码RNA - 疾病关联(LDA)。第三,我们开发了一个逻辑矩阵分解模型来预测可能的LDA。我们将我们提出的LDA - RWLMF方法与四种经典的LDA预测方法,即LNCSIM1、LNCSIM2、ILNCSIM和IDSSIM进行比较。在MNDR数据集上进行的5折交叉验证结果表明,LDA - RWLMF计算出的最佳AUC值为0.9312,优于上述四种LDA预测方法。最后,在确定LDA - RWLMF的性能后,我们分别对这两种癌症的所有lncRNA生物标志物进行排名。我们发现,在MNDR数据集上已知与乳腺癌和结直肠癌相关的所有lncRNA中,分别有48个和50个lncRNA与它们的关联得分最高。我们预测lncRNAs HULC和HAR1A可能分别是乳腺癌和结直肠癌的潜在生物标志物,需要生物医学实验验证。