Giuffrida M Valerio, Chen Feng, Scharr Hanno, Tsaftaris Sotirios A
1School of Engineering, Institute of Digital Communications, The University of Edinburgh, Edinburgh, EH9 3FB UK.
2IMT School For Advanced Studies Lucca, Piazza San Francesco, 19, 55100 Lucca, Italy.
Plant Methods. 2018 Feb 9;14:12. doi: 10.1186/s13007-018-0278-7. eCollection 2018.
Image-based plant phenotyping has become a powerful tool in unravelling genotype-environment interactions. The utilization of image analysis and machine learning have become paramount in extracting data stemming from phenotyping experiments. Yet we rely on observer (a human expert) input to perform the phenotyping process. We assume such input to be a 'gold-standard' and use it to evaluate software and algorithms and to train learning-based algorithms. However, we should consider whether any variability among experienced and non-experienced (including plain citizens) observers exists. Here we design a study that measures such variability in an annotation task of an integer-quantifiable phenotype: the leaf count.
We compare several experienced and non-experienced observers in annotating leaf counts in images of to measure intra- and inter-observer variability in a controlled study using specially designed annotation tools but also citizens using a distributed citizen-powered web-based platform. In the controlled study observers counted leaves by looking at top-view images, which were taken with low and high resolution optics. We assessed whether the utilization of tools specifically designed for this task can help to reduce such variability. We found that the presence of tools helps to reduce intra-observer variability, and that although intra- and inter-observer variability is present it does not have any effect on longitudinal leaf count trend statistical assessments. We compared the variability of citizen provided annotations (from the web-based platform) and found that plain citizens can provide statistically accurate leaf counts. We also compared a recent machine-learning based leaf counting algorithm and found that while close in performance it is still not within inter-observer variability.
While expertise of the observer plays a role, if sufficient statistical power is present, a collection of non-experienced users and even citizens can be included in image-based phenotyping annotation tasks as long they are suitably designed. We hope with these findings that we can re-evaluate the expectations that we have from automated algorithms: as long as they perform within observer variability they can be considered a suitable alternative. In addition, we hope to invigorate an interest in introducing suitably designed tasks on citizen powered platforms not only to obtain useful information (for research) but to help engage the public in this societal important problem.
基于图像的植物表型分析已成为揭示基因型 - 环境相互作用的强大工具。图像分析和机器学习的应用在从表型实验中提取数据方面变得至关重要。然而,我们依赖观察者(人类专家)的输入来执行表型分析过程。我们将这种输入视为“黄金标准”,并用它来评估软件和算法以及训练基于学习的算法。但是,我们应该考虑经验丰富和缺乏经验的观察者(包括普通公民)之间是否存在任何差异。在此,我们设计了一项研究,在一个整数可量化表型(叶片计数)的注释任务中测量这种差异。
我们比较了几位经验丰富和缺乏经验的观察者对图像中叶片计数的注释,以在一项对照研究中测量观察者内部和观察者之间的差异,该研究使用了专门设计的注释工具,同时也让公民通过一个分布式的公民驱动的基于网络平台进行操作。在对照研究中,观察者通过查看顶视图图像来计数叶片,这些图像是用低分辨率和高分辨率光学设备拍摄的。我们评估了专门为此任务设计的工具的使用是否有助于减少这种差异。我们发现工具的存在有助于减少观察者内部的差异,并且尽管存在观察者内部和观察者之间的差异,但它对纵向叶片计数趋势统计评估没有任何影响。我们比较了公民提供的注释(来自基于网络的平台)的差异,发现普通公民可以提供统计上准确的叶片计数。我们还比较了一种最近基于机器学习的叶片计数算法,发现虽然性能相近,但仍未在观察者之间的差异范围内。
虽然观察者的专业知识起作用,但如果有足够的统计能力,只要设计得当,一组缺乏经验的用户甚至公民也可以纳入基于图像的表型分析注释任务。我们希望通过这些发现能够重新评估我们对自动算法的期望:只要它们在观察者差异范围内表现,就可以被视为合适的替代方案。此外,我们希望激发人们对在公民驱动的平台上引入设计得当的任务的兴趣,不仅是为了获取有用信息(用于研究),而且是为了帮助公众参与这个具有社会重要性的问题。