• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大规模合成数据:一种在农业中有效利用机器学习的发展模式。

Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.

作者信息

Klein Jonathan, Waller Rebekah, Pirk Sören, Pałubicki Wojtek, Tester Mark, Michels Dominik L

机构信息

Computational Sciences Group, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

Center for Desert Agriculture, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Front Plant Sci. 2024 Sep 16;15:1360113. doi: 10.3389/fpls.2024.1360113. eCollection 2024.

DOI:10.3389/fpls.2024.1360113
PMID:39351023
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11439777/
Abstract

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants () using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.

摘要

在过去十年中,人工智能(AI)尤其是现代机器学习(ML)算法的兴起在农业领域引发了极大的兴趣。尽管它们无疑功能强大,但其主要缺点仍然是需要足够且多样的训练数据。真实数据集的收集及其标注是ML开发的主要成本驱动因素,虽然已经在合成生成的训练数据上取得了有前景的结果,但其生成本身并非没有困难。在本文中,我们提出了一种用于迭代、经济高效地生成合成训练数据的开发模型。通过使用合成训练数据开发一种低成本的番茄植株早期病害检测器()来展示其应用。一个神经分类器仅使用合成图像进行训练,其生成过程经过迭代优化以获得最佳性能。与其他依赖人工评估真实数据和合成数据之间相似度的方法不同,我们引入了一种结构化的定量方法。我们的评估表明,与使用非特定任务的真实训练数据相比,具有更好的泛化结果,并且与传统合成训练数据相比,开发成本效率更高。我们相信我们的方法将有助于在未来应用中降低合成数据生成的成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/aa9978f00b80/fpls-15-1360113-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/8afc68856046/fpls-15-1360113-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/718c3429237a/fpls-15-1360113-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/a0e6656bb44c/fpls-15-1360113-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/5ef7a38577e0/fpls-15-1360113-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/89b1ede4031b/fpls-15-1360113-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/1475c86af817/fpls-15-1360113-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/225c59c29d2f/fpls-15-1360113-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/2740b561335b/fpls-15-1360113-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/ec07799ab8a9/fpls-15-1360113-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/79b9669d0c3f/fpls-15-1360113-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/816f64ea7369/fpls-15-1360113-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/d61a62093f46/fpls-15-1360113-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/6965e4868706/fpls-15-1360113-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/aa9978f00b80/fpls-15-1360113-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/8afc68856046/fpls-15-1360113-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/718c3429237a/fpls-15-1360113-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/a0e6656bb44c/fpls-15-1360113-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/5ef7a38577e0/fpls-15-1360113-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/89b1ede4031b/fpls-15-1360113-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/1475c86af817/fpls-15-1360113-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/225c59c29d2f/fpls-15-1360113-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/2740b561335b/fpls-15-1360113-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/ec07799ab8a9/fpls-15-1360113-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/79b9669d0c3f/fpls-15-1360113-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/816f64ea7369/fpls-15-1360113-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/d61a62093f46/fpls-15-1360113-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/6965e4868706/fpls-15-1360113-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc71/11439777/aa9978f00b80/fpls-15-1360113-g014.jpg

相似文献

1
Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.大规模合成数据:一种在农业中有效利用机器学习的发展模式。
Front Plant Sci. 2024 Sep 16;15:1360113. doi: 10.3389/fpls.2024.1360113. eCollection 2024.
2
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.生成式人工智能生成高保真囊胚期胚胎图像。
Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064.
3
SinGAN-Seg: Synthetic training data generation for medical image segmentation.SinGAN-Seg:用于医学图像分割的合成训练数据生成。
PLoS One. 2022 May 2;17(5):e0267976. doi: 10.1371/journal.pone.0267976. eCollection 2022.
4
Demonstrating the successful application of synthetic learning in spine surgery for training multi-center models with increased patient privacy.展示了合成学习在脊柱外科中的成功应用,该方法用于训练具有更高患者隐私保护的多中心模型。
Sci Rep. 2023 Aug 1;13(1):12481. doi: 10.1038/s41598-023-39458-y.
5
Finetuning of GLIDE stable diffusion model for AI-based text-conditional image synthesis of dermoscopic images.用于基于人工智能的皮肤镜图像文本条件图像合成的GLIDE稳定扩散模型的微调。
Front Med (Lausanne). 2023 Oct 20;10:1231436. doi: 10.3389/fmed.2023.1231436. eCollection 2023.
6
Combining Synthetic Images and Deep Active Learning: Data-Efficient Training of an Industrial Object Detection Model.结合合成图像与深度主动学习:工业目标检测模型的数据高效训练
J Imaging. 2024 Jan 6;10(1):0. doi: 10.3390/jimaging10010016.
7
Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis.用于人工智能稳健、隐私保护训练的合成医学图像:在早产儿视网膜病变诊断中的应用
Ophthalmol Sci. 2022 Feb 11;2(2):100126. doi: 10.1016/j.xops.2022.100126. eCollection 2022 Jun.
8
An embedded system for the automated generation of labeled plant images to enable machine learning applications in agriculture.用于自动生成带标签植物图像的嵌入式系统,以支持农业领域的机器学习应用。
PLoS One. 2020 Dec 17;15(12):e0243923. doi: 10.1371/journal.pone.0243923. eCollection 2020.
9
Bridging the simulation-to-real gap for AI-based needle and target detection in robot-assisted ultrasound-guided interventions.弥合基于人工智能的针和目标检测在机器人辅助超声引导介入中的模拟与现实之间的差距。
Eur Radiol Exp. 2023 Jun 19;7(1):30. doi: 10.1186/s41747-023-00344-x.
10
SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease.SynthEye:研究合成数据对遗传性视网膜疾病人工智能辅助基因诊断的影响。
Ophthalmol Sci. 2022 Nov 22;3(2):100258. doi: 10.1016/j.xops.2022.100258. eCollection 2023 Jun.

引用本文的文献

1
Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy.深层阿拉米语:在金石学中实现机器学习的综合数据范例。
PLoS One. 2024 Apr 19;19(4):e0299297. doi: 10.1371/journal.pone.0299297. eCollection 2024.

本文引用的文献

1
Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强
J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.
2
Machine Learning in Agriculture: A Comprehensive Updated Review.农业中的机器学习:全面更新的综述。
Sensors (Basel). 2021 May 28;21(11):3758. doi: 10.3390/s21113758.
3
Managing the drone revolution: A systematic literature review into the current use of airborne drones and future strategic directions for their effective control.管理无人机革命:关于当前空中无人机使用情况及其有效管控未来战略方向的系统文献综述
J Air Transp Manag. 2020 Oct;89:101929. doi: 10.1016/j.jairtraman.2020.101929. Epub 2020 Sep 14.
4
The global burden of pathogens and pests on major food crops.主要粮食作物的病原体和害虫的全球负担。
Nat Ecol Evol. 2019 Mar;3(3):430-439. doi: 10.1038/s41559-018-0793-y. Epub 2019 Feb 4.
5
Plant Disease Detection by Imaging Sensors - Parallels and Specific Demands for Precision Agriculture and Plant Phenotyping.利用成像传感器进行植物病害检测——精准农业和植物表型分析的相似之处与特殊要求
Plant Dis. 2016 Feb;100(2):241-251. doi: 10.1094/PDIS-03-15-0340-FE. Epub 2016 Jan 18.
6
Crops : Generating Virtual Crops Using an Integrative and Multi-scale Modeling Platform.作物:使用综合多尺度建模平台生成虚拟作物
Front Plant Sci. 2017 May 15;8:786. doi: 10.3389/fpls.2017.00786. eCollection 2017.
7
Quantification of the effects of architectural traits on dry mass production and light interception of tomato canopy under different temperature regimes using a dynamic functional-structural plant model.利用动态功能-结构植物模型量化不同温度条件下番茄冠层结构特征对干物质生产和光截获的影响。
J Exp Bot. 2014 Dec;65(22):6399-410. doi: 10.1093/jxb/eru356. Epub 2014 Sep 2.