Wang Liugen, Wang Yan, Xuan Chenxu, Zhang Bai, Wu Hanwen, Gao Jie
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China.
School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China.
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad255.
Studies have confirmed that the occurrence of many complex diseases in the human body is closely related to the microbial community, and microbes can affect tumorigenesis and metastasis by regulating the tumor microenvironment. However, there are still large gaps in the clinical observation of the microbiota in disease. Although biological experiments are accurate in identifying disease-associated microbes, they are also time-consuming and expensive. The computational models for effective identification of diseases related microbes can shorten this process, and reduce capital and time costs. Based on this, in the paper, a model named DSAE_RF is presented to predict latent microbe-disease associations by combining multi-source features and deep learning. DSAE_RF calculates four similarities between microbes and diseases, which are then used as feature vectors for the disease-microbe pairs. Later, reliable negative samples are screened by k-means clustering, and a deep sparse autoencoder neural network is further used to extract effective features of the disease-microbe pairs. In this foundation, a random forest classifier is presented to predict the associations between microbes and diseases. To assess the performance of the model in this paper, 10-fold cross-validation is implemented on the same dataset. As a result, the AUC and AUPR of the model are 0.9448 and 0.9431, respectively. Furthermore, we also conduct a variety of experiments, including comparison of negative sample selection methods, comparison with different models and classifiers, Kolmogorov-Smirnov test and t-test, ablation experiments, robustness analysis, and case studies on Covid-19 and colorectal cancer. The results fully demonstrate the reliability and availability of our model.
研究证实,人体中许多复杂疾病的发生与微生物群落密切相关,微生物可通过调节肿瘤微环境影响肿瘤的发生和转移。然而,疾病中微生物群的临床观察仍存在很大差距。虽然生物学实验在识别与疾病相关的微生物方面很准确,但它们也耗时且昂贵。用于有效识别与疾病相关微生物的计算模型可以缩短这一过程,并降低资金和时间成本。基于此,本文提出了一种名为DSAE_RF的模型,通过结合多源特征和深度学习来预测潜在的微生物-疾病关联。DSAE_RF计算微生物与疾病之间的四种相似度,然后将其用作疾病-微生物对的特征向量。随后,通过k均值聚类筛选可靠的负样本,并进一步使用深度稀疏自动编码器神经网络提取疾病-微生物对的有效特征。在此基础上,提出了一种随机森林分类器来预测微生物与疾病之间的关联。为了评估本文模型的性能,在同一数据集上进行了10折交叉验证。结果,该模型的AUC和AUPR分别为0.9448和0.9431。此外,我们还进行了各种实验,包括负样本选择方法的比较、与不同模型和分类器的比较、柯尔莫哥洛夫-斯米尔诺夫检验和t检验、消融实验、稳健性分析以及针对新冠肺炎和结直肠癌的案例研究。结果充分证明了我们模型的可靠性和可用性。