Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, AZ 86005, USA.
J R Soc Interface. 2021 Nov;18(184):20210610. doi: 10.1098/rsif.2021.0610. Epub 2021 Nov 24.
Citizen science projects have the potential to address hypotheses requiring extremely large datasets that cannot be collected with the financial and labour constraints of most scientific projects. Data collection by the general public could expand the scope of scientific enquiry if these data accurately capture the system under study. However, data collection inconsistencies by the untrained public may result in biased datasets that do not accurately represent the natural world. In this paper, we harness the availability of scientific and public datasets of the Lyme disease tick vector to identify and account for biases in citizen science tick collections. Estimates of tick abundance from the citizen science dataset correspond moderately with estimates from direct surveillance but exhibit consistent biases. These biases can be mitigated by including factors that may impact collector participation or effort in statistical models, which, in turn, result in more accurate estimates of tick population sizes. Accounting for collection biases within large-scale, public participation datasets could update species abundance maps and facilitate using the wealth of citizen science data to answer scientific questions at scales that are not feasible with traditional datasets.
公民科学项目有可能解决需要极其庞大数据集的假设,而这些数据集是大多数科学项目在财务和劳动力方面所无法收集的。如果公众收集的数据能够准确地捕捉到所研究的系统,那么公众的数据收集就可以扩展科学研究的范围。然而,未经训练的公众的数据收集不一致可能导致有偏差的数据集,这些数据集不能准确地反映自然世界。在本文中,我们利用莱姆病 tick 媒介的科学和公共数据集的可用性,来识别和解释公民科学 tick 收集数据中的偏差。公民科学数据集的 tick 丰度估计与直接监测的估计中度相关,但表现出一致的偏差。通过在统计模型中包含可能影响收集者参与或努力的因素,可以减轻这些偏差,从而更准确地估计 tick 种群的规模。在大规模的公众参与数据集内考虑收集偏差,可以更新物种丰度图,并利用丰富的公民科学数据来回答在传统数据集不可行的规模上的科学问题。