Borza Victor A, Estornell Andrew, Clayton Ellen Wright, Ho Chien-Ju, Rothman Russell L, Vorobeychik Yevgeniy, Malin Bradley A
Vanderbilt University, Nashville, TN.
ByteDance Research, San Jose, CA.
AMIA Annu Symp Proc. 2025 May 22;2024:192-201. eCollection 2024.
Large participatory biomedical studies - studies that recruit individuals to join a dataset - are gaining popularity and investment, especially for analysis by modern AI methods. Because they purposively recruit participants, these studies are uniquely able to address a lack of historical representation, an issue that has affected many biomedical datasets. In this work, we define representativeness as the similarity to a target population distribution of a set of attributes and our goal is to mirror the U.S. population across distributions of age, gender, race, and ethnicity. Many participatory studies recruit at several institutions, so we introduce a computational approach to adaptively allocate recruitment resources among sites to improve representativeness. In simulated recruitment of 10,000-participant cohorts from medical centers in the STAR Clinical Research Network, we show that our approach yields a more representative cohort than existing baselines. Thus, we highlight the value of computational modeling in guiding recruitment efforts.
大型参与式生物医学研究——即招募个体加入数据集的研究——正越来越受欢迎并获得投资,尤其是在采用现代人工智能方法进行分析方面。由于这些研究是有目的地招募参与者,它们特别能够解决历史代表性不足的问题,而这一问题已经影响了许多生物医学数据集。在这项工作中,我们将代表性定义为一组属性与目标人群分布的相似性,我们的目标是在年龄、性别、种族和民族分布方面反映美国人群。许多参与式研究在多个机构进行招募,因此我们引入一种计算方法,以便在各站点之间自适应地分配招募资源,从而提高代表性。在模拟从STAR临床研究网络的医疗中心招募10000名参与者队列的过程中,我们表明,与现有的基线相比,我们的方法产生的队列更具代表性。因此,我们强调了计算建模在指导招募工作中的价值。