Thapa Ranjita, Zhang Kai, Snavely Noah, Belongie Serge, Khan Awais
Plant Pathology and Plant-Microbe Biology Section Cornell University Geneva New York 14456 USA.
Cornell Tech 2 W Loop Road New York 10044 USA.
Appl Plant Sci. 2020 Sep 28;8(9):e11390. doi: 10.1002/aps3.11390. eCollection 2020 Sep.
Apple orchards in the United States are under constant threat from a large number of pathogens and insects. Appropriate and timely deployment of disease management depends on early disease detection. Incorrect and delayed diagnosis can result in either excessive or inadequate use of chemicals, with increased production costs and increased environmental and health impacts.
We have manually captured 3651 high-quality, real-life symptom images of multiple apple foliar diseases, with variable illumination, angles, surfaces, and noise. A subset of images, expert-annotated to create a pilot data set for apple scab, cedar apple rust, and healthy leaves, was made available to the Kaggle community for the Plant Pathology Challenge as part of the Fine-Grained Visual Categorization (FGVC) workshop at the 2020 Computer Vision and Pattern Recognition conference (CVPR 2020). Participants were asked to use the image data set to train a machine learning model to classify disease categories and develop an algorithm for disease severity quantification. The top three area under the ROC curve (AUC) values submitted to the private leaderboard were 0.98445, 0.98182, and 0.98089. We also trained an off-the-shelf convolutional neural network on this data for disease classification and achieved 97% accuracy on a held-out test set.
This data set will contribute toward development and deployment of machine learning-based automated plant disease classification algorithms to ultimately realize fast and accurate disease detection. We will continue to add images to the pilot data set for a larger, more comprehensive expert-annotated data set for future Kaggle competitions and to explore more advanced methods for disease classification and quantification.
美国的苹果园不断受到大量病原体和昆虫的威胁。疾病管理的适当和及时部署取决于疾病的早期检测。不正确和延迟的诊断可能导致化学药剂使用过量或不足,从而增加生产成本,并对环境和健康产生更大影响。
我们手动采集了3651张高质量的多种苹果叶部病害的真实症状图像,这些图像具有不同的光照、角度、表面和噪声。作为2020年计算机视觉与模式识别会议(CVPR 2020)细粒度视觉分类(FGVC)研讨会的一部分,我们提供了一部分经过专家注释的图像,用于创建苹果黑星病、苹果锈病和健康叶片的试验数据集,供Kaggle社区参与植物病理学挑战赛。要求参与者使用图像数据集训练机器学习模型以对疾病类别进行分类,并开发一种疾病严重程度量化算法。提交到私有排行榜的ROC曲线下面积(AUC)值排名前三的分别为0.98445、0.98182和0.98089。我们还在这些数据上训练了一个现成的卷积神经网络用于疾病分类,并在一个留出的测试集上达到了97%的准确率。
该数据集将有助于基于机器学习的自动植物病害分类算法的开发和部署,最终实现快速准确的病害检测。我们将继续向试验数据集中添加图像,以创建一个更大、更全面的经过专家注释的数据集,用于未来的Kaggle竞赛,并探索更先进的病害分类和量化方法。