Chen Xingjian, Zhu Zifan, Zhang Weitong, Wang Yuchen, Wang Fuzhou, Yang Jianyi, Wong Ka-Chun
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
iScience. 2022 Mar 16;25(4):104081. doi: 10.1016/j.isci.2022.104081. eCollection 2022 Apr 15.
Human disease prediction from microbiome data has broad implications in metagenomics. It is rare for the existing methods to consider abundance profiles from both known and unknown microbial organisms, or capture the taxonomic relationships among microbial taxa, leading to significant information loss. On the other hand, deep learning has shown unprecedented advantages in classification tasks for its feature-learning ability. However, it encounters the opposite situation in metagenome-based disease prediction since high-dimensional low-sample-size metagenomic datasets can lead to severe overfitting; and black-box model fails in providing biological explanations. To circumvent the related problems, we developed MetaDR, a comprehensive machine learning-based framework that integrates various information and deep learning to predict human diseases. Experimental results indicate that MetaDR achieves competitive prediction performance with a reduction in running time, and effectively discovers the informative features with biological insights.
基于微生物组数据进行人类疾病预测在宏基因组学中具有广泛的意义。现有方法很少考虑已知和未知微生物有机体的丰度概况,也很少捕捉微生物分类群之间的分类关系,从而导致大量信息丢失。另一方面,深度学习因其特征学习能力在分类任务中展现出了前所未有的优势。然而,在基于宏基因组的疾病预测中,它却面临相反的情况,因为高维低样本量的宏基因组数据集可能导致严重的过拟合,并且黑箱模型无法提供生物学解释。为了规避相关问题,我们开发了MetaDR,这是一个基于机器学习的综合框架,它整合了各种信息和深度学习来预测人类疾病。实验结果表明,MetaDR在运行时间减少的情况下实现了具有竞争力的预测性能,并有效地发现了具有生物学见解的信息特征。