Biodata Mining Group, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
National Oceanography Centre, University of Southampton Waterfront Campus, Southampton, United Kingdom.
PLoS One. 2019 Jun 12;14(6):e0218086. doi: 10.1371/journal.pone.0218086. eCollection 2019.
The evaluation of large amounts of digital image data is of growing importance for biology, including for the exploration and monitoring of marine habitats. However, only a tiny percentage of the image data collected is evaluated by marine biologists who manually interpret and annotate the image contents, which can be slow and laborious. In order to overcome the bottleneck in image annotation, two strategies are increasingly proposed: "citizen science" and "machine learning". In this study, we investigated how the combination of citizen science, to detect objects, and machine learning, to classify megafauna, could be used to automate annotation of underwater images. For this purpose, multiple large data sets of citizen science annotations with different degrees of common errors and inaccuracies observed in citizen science data were simulated by modifying "gold standard" annotations done by an experienced marine biologist. The parameters of the simulation were determined on the basis of two citizen science experiments. It allowed us to analyze the relationship between the outcome of a citizen science study and the quality of the classifications of a deep learning megafauna classifier. The results show great potential for combining citizen science with machine learning, provided that the participants are informed precisely about the annotation protocol. Inaccuracies in the position of the annotation had the most substantial influence on the classification accuracy, whereas the size of the marking and false positive detections had a smaller influence.
大量数字图像数据的评估对于生物学越来越重要,包括对海洋栖息地的探索和监测。然而,只有一小部分图像数据被海洋生物学家进行评估,他们手动解释和注释图像内容,这可能既缓慢又费力。为了克服图像注释的瓶颈,越来越多地提出了两种策略:“公民科学”和“机器学习”。在这项研究中,我们研究了如何结合公民科学来检测物体,以及机器学习来对大型动物进行分类,从而实现水下图像的自动注释。为此,我们通过修改由经验丰富的海洋生物学家完成的“黄金标准”注释,模拟了具有不同程度常见错误和不准确的公民科学注释的多个大数据集。模拟的参数是基于两个公民科学实验确定的。这使我们能够分析公民科学研究的结果与深度学习大型动物分类器的分类质量之间的关系。结果表明,只要参与者准确了解注释协议,就可以极大地发挥公民科学与机器学习相结合的潜力。注释位置的不准确性对分类准确性的影响最大,而标记的大小和假阳性检测的影响较小。