Suppr超能文献

基于微生物组的分类模型在新鲜农产品安全和质量评价中的应用。

Microbiome-based classification models for fresh produce safety and quality evaluation.

机构信息

Department of Food Science and Technology, University of California Davis, Davis, California, USA.

Department of Molecular and Cellular Biology, University of California Davis, Davis, California, USA.

出版信息

Microbiol Spectr. 2024 Apr 2;12(4):e0344823. doi: 10.1128/spectrum.03448-23. Epub 2024 Mar 6.

Abstract

UNLABELLED

Small sample sizes and loss of sequencing reads during the microbiome data preprocessing can limit the statistical power of differentiating fresh produce phenotypes and prevent the detection of important bacterial species associated with produce contamination or quality reduction. Here, we explored a machine learning-based -mer hash analysis strategy to identify DNA signatures predictive of produce safety (PS) and produce quality (PQ) and compared it against the amplicon sequence variant (ASV) strategy that uses a typical denoising step and ASV-based taxonomy strategy. Random forest-based classifiers for PS and PQ using 7-mer hash data sets had significantly higher classification accuracy than those using the ASV data sets. We also demonstrated that the proposed combination of integrating multiple data sets and leveraging a 7-mer hash strategy leads to better classification performance for PS and PQ compared to the ASV method but presents lower PS classification accuracy compared to the feature-selected ASV-based taxonomy strategy. Due to the current limitation of generating taxonomy using the 7-mer hash strategy, the ASV-based taxonomy strategy with remarkably less computing time and memory usage is more efficient for PS and PQ classification and applicable for important taxa identification. Results generated from this study lay the foundation for future studies that wish and need to incorporate and/or compare different microbiome sequencing data sets for the application of machine learning in the area of microbial safety and quality of food.

IMPORTANCE

Identification of generalizable indicators for produce safety (PS) and produce quality (PQ) improves the detection of produce contamination and quality decline. However, effective sequencing read loss during microbiome data preprocessing and the limited sample size of individual studies restrain statistical power to identify important features contributing to differentiating PS and PQ phenotypes. We applied machine learning-based models using individual and integrated -mer hash and amplicon sequence variant (ASV) data sets for PS and PQ classification and evaluated their classification performance and found that random forest (RF)-based models using integrated 7-mer hash data sets achieved significantly higher PS and PQ classification accuracy. Due to the limitation of taxonomic analysis for the 7-mer hash, we also developed RF-based models using feature-selected ASV-based taxonomic data sets, which performed better PS classification than those using the integrated 7-mer hash data set. The RF feature selection method identified 480 PS indicators and 263 PQ indicators with a positive contribution to the PS and PQ classification.

摘要

未加标签

在微生物组数据预处理过程中,小样本量和测序reads 的丢失会限制区分新鲜农产品表型的统计能力,并阻止检测与农产品污染或质量下降相关的重要细菌种类。在这里,我们探索了一种基于机器学习的-mer 哈希分析策略,以识别与农产品安全 (PS) 和农产品质量 (PQ) 相关的 DNA 特征,并将其与使用典型去噪步骤和基于 ASV 的分类策略的扩增子序列变异 (ASV) 策略进行比较。基于随机森林的 PS 和 PQ 7-mer hash 数据集分类器的分类准确性明显高于基于 ASV 数据集的分类器。我们还证明了,与 ASV 方法相比,整合多个数据集并利用 7-mer hash 策略的提议组合可实现更好的 PS 和 PQ 分类性能,但与基于特征选择的 ASV 分类策略相比,PS 分类准确性较低。由于当前使用 7-mer hash 策略生成分类的局限性,基于 ASV 的分类策略具有显著较少的计算时间和内存使用,因此更适用于 PS 和 PQ 分类,并且适用于重要分类群的鉴定。本研究的结果为未来的研究奠定了基础,这些研究希望并需要整合和/或比较不同的微生物组测序数据集,以将机器学习应用于食品微生物安全和质量领域。

重要性

识别农产品安全 (PS) 和农产品质量 (PQ) 的可推广指标可以提高对农产品污染和质量下降的检测能力。然而,在微生物组数据预处理过程中有效的测序 read 丢失以及个别研究的有限样本量限制了识别区分 PS 和 PQ 表型的重要特征的统计能力。我们应用了基于机器学习的模型,使用个体和整合的-mer hash 和扩增子序列变异 (ASV) 数据集进行 PS 和 PQ 分类,并评估了它们的分类性能,发现使用整合的 7-mer hash 数据集的基于随机森林 (RF) 的模型实现了更高的 PS 和 PQ 分类准确性。由于 7-mer hash 的分类分析限制,我们还开发了基于 RF 的模型,使用基于特征选择的 ASV 分类数据,其 PS 分类性能优于使用整合的 7-mer hash 数据集的模型。RF 特征选择方法识别了 480 个 PS 指标和 263 个 PQ 指标,它们对 PS 和 PQ 分类有积极贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ba2/10986475/c753b626c8bc/spectrum.03448-23.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验