School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China.
Suzhou Research Center of Medical School, Suzhou Hospital, Affiliated Hospital of Medical School, Nanjing University, 215153 Suzhou, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae481.
Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$\pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.
预测微生物与疾病之间的关联为开发诊断、预防和治疗策略开辟了新途径。鉴于验证这些关联的基于实验室的生物学测试通常既耗时又昂贵,因此迫切需要创新的计算框架来预测新的微生物-疾病关联。在这项工作中,我们引入了一种名为使用跨域矩阵分解(CMFHMDA)预测人类疾病-微生物关联的新预测算法。最初,我们计算疾病的综合相似性和微生物的高斯相互作用分布相似性。然后,我们应用加权 K 最近已知邻居(WKNKN)算法来细化微生物-疾病关联矩阵。我们的 CMFHMDA 模型随后通过整合微生物和疾病的网络数据来预测潜在的关联。该方法的关键创新包括使用 WKNKN 算法预处理关联矩阵中的缺失值,并将来自微生物和疾病的跨域信息纳入 CMFHMDA 模型。为了验证 CMFHMDA,我们采用了三种不同的交叉验证技术来评估模型的准确性。结果表明,CMFHMDA 模型在全局留一法交叉验证(LOOCV)、局部 LOOCV 和五重交叉验证(CV)中分别实现了 0.9172、0.8551 和 0.9351$\pm $0.0052 的接收器操作特征曲线下面积得分。此外,许多预测的关联已被已发表的实验研究证实,这确立了 CMFHMDA 作为预测潜在疾病相关微生物的有效工具。