Suppr超能文献

利用人口统计学实现公民科学中的高效数据分类:一种贝叶斯方法。

Using demographics toward efficient data classification in citizen science: a Bayesian approach.

作者信息

De Lellis Pietro, Nakayama Shinnosuke, Porfiri Maurizio

机构信息

Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy.

Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA.

出版信息

PeerJ Comput Sci. 2019 Nov 25;5:e239. doi: 10.7717/peerj-cs.239. eCollection 2019.

Abstract

Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.

摘要

公众参与科学活动,通常称为公民科学,为收集和分析数量空前庞大的数据提供了一种可能。然而,志愿者的多样性给汇总这些数据时获取准确信息带来了挑战。为克服这一问题,我们提出一种使用贝叶斯推理的分类算法,该算法利用志愿者的多样性来提高数据准确性。在该算法中,根据关于志愿者教育水平或参与公民科学的动机的一项调查,将每个志愿者归入一个不同的类别。我们通过一个训练集得出每个类别的行为,然后将其用作先验信息来估计新志愿者的表现。通过将这种方法应用于现有的公民科学数据集以将图像分类,与传统的多数投票相比,我们证明了数据准确性的提高。我们的算法提供了一种简单却强大的方法,通过预测一类个体的行为,而非试图对每个个体进行细致描述,在志愿者付出有限努力的情况下提高数据准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验