IEEE J Biomed Health Inform. 2020 Oct;24(10):2993-3001. doi: 10.1109/JBHI.2020.2993761. Epub 2020 May 11.
Accurate prediction of the host phenotype from a metagenomic sample and identification of the associated microbial markers are important in understanding potential host-microbiome interactions related to disease initiation and progression. We introduce PopPhy-CNN, a novel convolutional neural network (CNN) learning framework that effectively exploits phylogenetic structure in microbial taxa for host phenotype prediction. Our approach takes an input format of a 2D matrix representing the phylogenetic tree populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data. We show the competitiveness of our model compared to other available methods using nine metagenomic datasets of moderate size for binary classification. With synthetic and biological datasets, we show the superior and robust performance of our model for multi-class classification. Furthermore, we design a novel scheme for feature extraction from the learned CNN models and demonstrate improved performance when the extracted features. PopPhy-CNN is a practical deep learning framework for the prediction of host phenotype with the ability of facilitating the retrieval of predictive microbial taxa.
从宏基因组样本中准确预测宿主表型并识别相关的微生物标志物对于理解与疾病发生和进展相关的潜在宿主-微生物组相互作用非常重要。我们引入了 PopPhy-CNN,这是一种新颖的卷积神经网络(CNN)学习框架,可有效地利用微生物分类群中的系统发育结构进行宿主表型预测。我们的方法采用了一种输入格式,即二维矩阵表示带有相对丰度的微生物分类群的系统发育树,这些分类群存在于宏基因组样本中。这种转换使 CNN 能够探索树状分类注释的空间关系及其在宏基因组数据中的定量特征。我们使用九个中等大小的宏基因组数据集进行二进制分类,展示了我们的模型与其他可用方法的竞争力。通过合成和生物数据集,我们展示了我们的模型在多类分类方面的优越和稳健性能。此外,我们设计了一种从学习的 CNN 模型中提取特征的新方案,并证明了提取特征时的性能得到了提高。PopPhy-CNN 是一个实用的深度学习框架,用于预测宿主表型,具有促进预测性微生物分类群检索的能力。