Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel.
The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel.
Gut Microbes. 2023 Jan-Dec;15(1):2224474. doi: 10.1080/19490976.2023.2224474.
The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine-learning-based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing or shotgun metagenomics. However, several properties of microbial sequence-based studies hinder machine learning (ML), including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of taxa present in a small subset of samples. We show here using a graph representation that the cladogram structure is as informative as the taxa frequency. We then suggest a novel method to combine information from different taxa and improve data representation for ML using microbial taxonomy. iMic (image microbiome) translates the microbiome to images through an iterative ordering scheme, and applies convolutional neural networks to the resulting image. We show that iMic has a higher precision in static microbiome gene sequence-based ML than state-of-the-art methods. iMic also facilitates the interpretation of the classifiers through an explainable artificial intelligence (AI) algorithm to iMic to detect taxa relevant to each condition. iMic is then extended to dynamic microbiome samples by translating them to movies.
人类肠道微生物组与许多疾病病因有关。因此,它是基于机器学习的多种疾病和病症生物标志物开发的自然候选者。微生物组通常使用 16S rRNA 基因测序或鸟枪法宏基因组学进行分析。然而,基于微生物序列的研究的几个特性会阻碍机器学习(ML),包括非均匀表示、与每个样本的维度相比样本数量较少,以及数据稀疏,大多数分类群存在于少数样本中。我们在这里使用图表示法表明,系统发育树结构与分类群频率一样具有信息量。然后,我们建议使用一种新方法来结合来自不同分类群的信息,并使用微生物分类学来改善 ML 中的数据表示。iMic(图像微生物组)通过迭代排序方案将微生物组转换为图像,并将卷积神经网络应用于生成的图像。我们表明,iMic 在基于静态微生物组基因序列的 ML 中的精度高于最新方法。iMic 还通过可解释的人工智能(AI)算法对 iMic 进行解释,以检测与每种情况相关的分类群,从而有助于解释分类器。iMic 然后通过将其转换为电影来扩展到动态微生物组样本。