Department of Physics, University of Winnipeg, Winnipeg, Manitoba, Canada.
Department of Applied Computer Science, University of Winnipeg, Winnipeg, Manitoba, Canada.
PLoS One. 2020 Dec 17;15(12):e0243923. doi: 10.1371/journal.pone.0243923. eCollection 2020.
A lack of sufficient training data, both in terms of variety and quantity, is often the bottleneck in the development of machine learning (ML) applications in any domain. For agricultural applications, ML-based models designed to perform tasks such as autonomous plant classification will typically be coupled to just one or perhaps a few plant species. As a consequence, each crop-specific task is very likely to require its own specialized training data, and the question of how to serve this need for data now often overshadows the more routine exercise of actually training such models. To tackle this problem, we have developed an embedded robotic system to automatically generate and label large datasets of plant images for ML applications in agriculture. The system can image plants from virtually any angle, thereby ensuring a wide variety of data; and with an imaging rate of up to one image per second, it can produce lableled datasets on the scale of thousands to tens of thousands of images per day. As such, this system offers an important alternative to time- and cost-intensive methods of manual generation and labeling. Furthermore, the use of a uniform background made of blue keying fabric enables additional image processing techniques such as background replacement and image segementation. It also helps in the training process, essentially forcing the model to focus on the plant features and eliminating random correlations. To demonstrate the capabilities of our system, we generated a dataset of over 34,000 labeled images, with which we trained an ML-model to distinguish grasses from non-grasses in test data from a variety of sources. We now plan to generate much larger datasets of Canadian crop plants and weeds that will be made publicly available in the hope of further enabling ML applications in the agriculture sector.
缺乏足够的训练数据,无论是在种类还是数量方面,通常都是机器学习(ML)在任何领域应用开发的瓶颈。对于农业应用,设计用于执行自主植物分类等任务的基于 ML 的模型通常仅与一种或几种植物物种相关联。因此,每个特定于作物的任务都很可能需要自己专门的训练数据,而如何满足对这些数据的需求的问题现在常常超过了实际训练这些模型的常规工作。为了解决这个问题,我们开发了一种嵌入式机器人系统,用于为农业中的 ML 应用自动生成和标记大量植物图像数据集。该系统可以从几乎任何角度对植物进行成像,从而确保数据种类繁多;并且成像速度高达每秒一张,每天可以生成数千到数万张标记数据集。因此,与手动生成和标记的耗时且昂贵的方法相比,该系统提供了一种重要的替代方案。此外,使用由蓝色键控织物制成的统一背景还可以实现其他图像处理技术,例如背景替换和图像分割。它还有助于训练过程,实际上迫使模型专注于植物特征,并消除随机相关性。为了展示我们系统的功能,我们生成了一个超过 34000 张标记图像的数据集,并用它来训练一个 ML 模型,以在来自各种来源的测试数据中区分草和非草。我们现在计划生成更大的加拿大作物植物和杂草数据集,并将其公开,希望进一步支持农业领域的 ML 应用。