Suppr超能文献

用于食品安全应用的光谱学方法:利用主动学习和半监督学习提高数据效率

Spectroscopy Approaches for Food Safety Applications: Improving Data Efficiency Using Active Learning and Semi-supervised Learning.

作者信息

Zhang Huanle, Wisuthiphaet Nicharee, Cui Hemiao, Nitin Nitin, Liu Xin, Zhao Qing

机构信息

Department of Computer Science, University of California, Davis, Davis, CA, United States.

Department of Food Science and Technology, University of California, Davis, Davis, CA, United States.

出版信息

Front Artif Intell. 2022 Jun 22;5:863261. doi: 10.3389/frai.2022.863261. eCollection 2022.

Abstract

The past decade witnessed rapid development in the measurement and monitoring technologies for food science. Among these technologies, spectroscopy has been widely used for the analysis of food quality, safety, and nutritional properties. Due to the complexity of food systems and the lack of comprehensive predictive models, rapid and simple measurements to predict complex properties in food systems are largely missing. Machine Learning (ML) has shown great potential to improve the classification and prediction of these properties. However, the barriers to collecting large datasets for ML applications still persists. In this paper, we explore different approaches of data annotation and model training to improve data efficiency for ML applications. Specifically, we leverage Active Learning (AL) and Semi-Supervised Learning (SSL) and investigate four approaches: baseline passive learning, AL, SSL, and a hybrid of AL and SSL. To evaluate these approaches, we collect two spectroscopy datasets: predicting plasma dosage and detecting foodborne pathogen. Our experimental results show that, compared to the passive learning approach, advanced approaches (AL, SSL, and the hybrid) can greatly reduce the number of labeled samples, with some cases decreasing the number of labeled samples by more than half.

摘要

过去十年见证了食品科学测量与监测技术的快速发展。在这些技术中,光谱学已被广泛用于食品质量、安全和营养特性的分析。由于食品体系的复杂性以及缺乏全面的预测模型,用于预测食品体系复杂特性的快速且简单的测量方法在很大程度上尚不存在。机器学习(ML)已显示出改善这些特性分类和预测的巨大潜力。然而,为机器学习应用收集大型数据集的障碍仍然存在。在本文中,我们探索数据标注和模型训练的不同方法,以提高机器学习应用的数据效率。具体而言,我们利用主动学习(AL)和半监督学习(SSL)并研究四种方法:基线被动学习、主动学习、半监督学习以及主动学习与半监督学习的混合方法。为了评估这些方法,我们收集了两个光谱学数据集:预测血浆剂量和检测食源性病原体。我们的实验结果表明,与被动学习方法相比,先进方法(主动学习、半监督学习和混合方法)可以大大减少标记样本的数量,在某些情况下标记样本数量减少超过一半。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e397/9257238/2bb2be6b3938/frai-05-863261-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验