Baminiwatte Ranga, Torsu Blessing, Scherbakov Dmitry, Mollalo Abolfazl, Obeid Jihad S, Alekseyenko Alexander V, Lenert Leslie A
Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA.
Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA.
Int J Med Inform. 2025 Mar;195:105766. doi: 10.1016/j.ijmedinf.2024.105766. Epub 2024 Dec 19.
This scoping review aims to clarify the definition and trajectory of citizen-led scientific research (so-called citizen science) within the healthcare domain, examine the degree of integration of machine learning (ML) and the participation levels of citizen scientists in health-related projects.
In January and September 2024 we conducted a comprehensive search in PubMed, Scopus, Web of Science, and EBSCOhost platform for peer-reviewed publications that combine citizen science and machine learning (ML) in healthcare. Articles were excluded if citizens were merely passive data providers or if only professional scientists were involved.
Out of an initial 1,395 screened, 56 articles spanning from 2013 to 2024 met the inclusion criteria. The majority of research projects were conducted in the U.S. (n = 20, 35.7 %), followed by Germany (n = 6, 10.7 %), with Spain, Canada, and the UK each contributing three studies (5.4 %). Data collection was the primary form of citizen scientist involvement (n = 29, 51.8 %), which included capturing images, sharing data online, and mailing samples. Data annotation was the next most common activity (n = 15, 26.8 %), followed by participation in ML model challenges (n = 8, 14.3 %) and decision-making contributions (n = 3, 5.4 %). Mosquitoes (n = 10, 34.5 %) and air pollution samples (n = 7, 24.2 %) were the main data objects collected by citizens for ML analysis. Classification tasks were the most prevalent ML method (n = 30, 52.6 %), with Convolutional Neural Networks being the most frequently used algorithm (n = 13, 20 %).
Citizen science in healthcare is currently an American and European construct with growing expansion in Asia. Citizens are contributing data, and labeling data for ML methods, but only infrequently analyzing or leading studies. Projects that use "crowd-sourced" data and "citizen science" should be differentiated depending on the degree of involvement of citizens.
本范围综述旨在明确医疗保健领域中公民主导的科学研究(即所谓的公民科学)的定义和发展轨迹,考察机器学习(ML)的整合程度以及公民科学家在健康相关项目中的参与水平。
2024年1月和9月,我们在PubMed、Scopus、科学网和EBSCOhost平台上进行了全面搜索,以查找在医疗保健领域将公民科学与机器学习(ML)相结合的同行评审出版物。如果公民仅仅是被动的数据提供者,或者仅涉及专业科学家,则排除相关文章。
在初步筛选的1395篇文章中,有56篇2013年至2024年期间的文章符合纳入标准。大多数研究项目在美国开展(n = 20,35.7%),其次是德国(n = 6,10.7%),西班牙、加拿大和英国各有三项研究(5.4%)。数据收集是公民科学家参与的主要形式(n = 29,51.8%),包括拍摄图像、在线共享数据和邮寄样本。数据标注是其次最常见的活动(n = 15,26.8%),其次是参与ML模型挑战(n = 8,14.3%)和决策贡献(n = 3,5.4%)。蚊子(n = 10,34.5%)和空气污染样本(n = 7,24.2%)是公民收集用于ML分析的主要数据对象。分类任务是最普遍的ML方法(n = 30,52.6%),卷积神经网络是最常用的算法(n = 13,20%)。
医疗保健领域的公民科学目前是一种美国和欧洲的概念,在亚洲的扩展也在不断增加。公民正在为ML方法贡献数据并标注数据,但很少进行分析或主导研究。使用“众包”数据和“公民科学”的项目应根据公民的参与程度进行区分。