Mreyoud Yassin, Song Myoungkyu, Lim Jihun, Ahn Tae-Hyuk
Program in Bioinformatics and Computational Biology, Saint Louis University, Saint Louis, MO 63104, USA.
Department of Computer Science, University of Nebraska Omaha, Omaha, NE 68182, USA.
Life (Basel). 2022 Apr 30;12(5):669. doi: 10.3390/life12050669.
The diversity within different microbiome communities that drive biogeochemical processes influences many different phenotypes. Analyses of these communities and their diversity by countless microbiome projects have revealed an important role of metagenomics in understanding the complex relation between microbes and their environments. This relationship can be understood in the context of microbiome composition of specific known environments. These compositions can then be used as a template for predicting the status of similar environments. Machine learning has been applied as a key component to this predictive task. Several analysis tools have already been published utilizing machine learning methods for metagenomic analysis. Despite the previously proposed machine learning models, the performance of deep neural networks is still under-researched. Given the nature of metagenomic data, deep neural networks could provide a strong boost to growth in the prediction accuracy in metagenomic analysis applications. To meet this urgent demand, we present a deep learning based tool that utilizes a deep neural network implementation for phenotypic prediction of unknown metagenomic samples. (1) First, our tool takes as input taxonomic profiles from 16S or WGS sequencing data. (2) Second, given the samples, our tool builds a model based on a deep neural network by computing multi-level classification. (3) Lastly, given the model, our tool classifies an unknown sample with its unlabeled taxonomic profile. In the benchmark experiments, we deduced that an analysis method facilitating a deep neural network such as our tool can show promising results in increasing the prediction accuracy on several samples compared to other machine learning models.
驱动生物地球化学过程的不同微生物群落内部的多样性会影响许多不同的表型。无数微生物组项目对这些群落及其多样性进行的分析揭示了宏基因组学在理解微生物与其环境之间复杂关系方面的重要作用。这种关系可以在特定已知环境的微生物组组成的背景下得到理解。然后,这些组成可以用作预测相似环境状态的模板。机器学习已被用作这一预测任务的关键组成部分。已经发表了几种利用机器学习方法进行宏基因组分析的工具。尽管之前提出了机器学习模型,但深度神经网络的性能仍未得到充分研究。鉴于宏基因组数据的性质,深度神经网络可以极大地提高宏基因组分析应用中的预测准确性。为了满足这一迫切需求,我们提出了一种基于深度学习的工具,该工具利用深度神经网络实现对未知宏基因组样本的表型预测。(1)首先,我们的工具将16S或全基因组测序(WGS)数据中的分类学概况作为输入。(2)其次给定这些样本,我们的工具通过计算多级分类基于深度神经网络构建一个模型。(3)最后,给定该模型,我们的工具使用未标记的分类学概况对未知样本进行分类。在基准实验中,我们推断,像我们的工具这样便于使用深度神经网络的分析方法,与其他机器学习模型相比,在提高对多个样本的预测准确性方面可能会显示出有前景的结果。