Suppr超能文献

比较多个二分类模型在微生物高通量测序数据集上的性能。

Compare the performance of multiple binary classification models in microbial high-throughput sequencing datasets.

机构信息

College of Environment, Zhejiang University of Technology, Hangzhou, Zhejiang 310032, PR China.

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, PR China.

出版信息

Sci Total Environ. 2022 Sep 1;837:155807. doi: 10.1016/j.scitotenv.2022.155807. Epub 2022 May 7.

Abstract

The development of machine learning and deep learning provided solutions for predicting microbiota response on environmental change based on microbial high-throughput sequencing. However, there were few studies specifically clarifying the performance and practical of two types of binary classification models to find a better algorithm for the microbiota data analysis. Here, for the first time, we evaluated the performance, accuracy and running time of the binary classification models built by three machine learning methods - random forest (RF), support vector machine (SVM), logistic regression (LR), and one deep learning method - back propagation neural network (BPNN). The built models were based on the microbiota datasets that removed low-quality variables and solved the class imbalance problem. Additionally, we optimized the models by tuning. Our study demonstrated that dataset pre-processing was a necessary process for model construction. Among these 4 binary classification models, BPNN and RF were the most suitable methods for constructing microbiota binary classification models. Using these 4 models to predict multiple microbial datasets, BPNN showed the highest accuracy and the most robust performance, while the RF method was ranked second. We also constructed the optimal models by adjusting the epochs of BPNN and the n_estimators of RF for six times. The evaluation related to performances of models provided a road map for the application of artificial intelligence to assess microbial ecology.

摘要

机器学习和深度学习的发展为基于微生物高通量测序预测微生物群落对环境变化的响应提供了解决方案。然而,很少有研究专门阐明这两种类型的二分类模型的性能和实用性,以找到更好的微生物数据分析算法。在这里,我们首次评估了三种机器学习方法(随机森林(RF)、支持向量机(SVM)、逻辑回归(LR))和一种深度学习方法(反向传播神经网络(BPNN))构建的二分类模型的性能、准确性和运行时间。所构建的模型基于去除低质量变量和解决类别不平衡问题的微生物数据集。此外,我们通过调优对模型进行了优化。我们的研究表明,数据集预处理是模型构建的必要过程。在这 4 个二分类模型中,BPNN 和 RF 是构建微生物二分类模型最适合的方法。使用这 4 个模型来预测多个微生物数据集,BPNN 表现出最高的准确性和最稳健的性能,而 RF 方法排名第二。我们还通过调整 BPNN 的 epoch 和 RF 的 n_estimators 对这 6 个模型进行了优化,以构建最优模型。有关模型性能的评估为人工智能在评估微生物生态学中的应用提供了路线图。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验