Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University , Beijing, China.
EPSRC/BBSRC Future Biomanufacturing Research Hub, EPSRC Synthetic Biology Research Centre SYNBIOCHEM Manchester Institute of Biotechnology and School of Chemistry, The University of Manchester , Manchester, United Kingdom.
Microbiol Spectr. 2023 Oct 17;11(5):e0523722. doi: 10.1128/spectrum.05237-22. Epub 2023 Sep 11.
Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.
对微生物群落进行分类学分析对于构建微生物相互作用模型和保护生境至关重要。本研究开发了从公共海洋宏基因组中构建训练/测试数据集的方法,并评估了机器学习(ML)方法在基于读取的海洋宏基因组分类中的性能。使用两个模型的预测来测试宏基因组分类的准确性,并指导 ML 方法的改进。我们的研究提供了关于深度学习在海洋微生物宏基因组数据集上的方法、结果和挑战的见解。未来的机器学习方法可以通过纠正训练数据集中的基因组覆盖和类别不平衡、开发替代模型以及增加模型训练和改进的计算资源的可访问性来得到改进。