Knapp Emily A, Kress Amii M, Ghidey Ronel, Gorham Tyler J, Galdo Brendan, Petrill Stephen A, Aris Izzuddin M, Bastain Theresa M, Camargo Carlos A, Coccia Michael A, Cragoe Nicholas, Dabelea Dana, Dunlop Anne L, Gebretsadik Tebeb, Hartert Tina, Hipwell Alison E, Johnson Christine C, Karagas Margaret R, LeWinn Kaja Z, Maldonado Luis Enrique, McEvoy Cindy T, Mirzakhani Hooman, O'Connor Thomas G, O'Shea T Michael, Wang Zhu, Wright Rosalind J, Ziegler Katherine, Zhu Yeyi, Bartlett Christopher W, Lau Bryan
From the Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD.
IT Research & Innovation, Nationwide Children's Hospital, Columbus, OH.
Epidemiology. 2025 May 1;36(3):413-424. doi: 10.1097/EDE.0000000000001832. Epub 2025 Apr 1.
Collaborative research consortia provide an efficient method to increase sample size, enabling evaluation of subgroup heterogeneity and rare outcomes. In addition to missing data challenges faced by all cohort studies like nonresponse and attrition, collaborative studies have missing data due to differences in study design and measurement of the contributing studies.
We extend ROSETTA, a latent variable method that creates common measures across datasets collecting the same latent constructs with only partial overlap in measures, to define a common measure of socioeconomic status (SES) across cohorts with varying indicators in the Environmental influences on Child Health Outcomes Cohort, a consortium of pregnancy and pediatric cohorts.
Starting with 52 indicators of prenatal SES from 39,372 participants across 53 cohorts, ROSETTA created three factors representing key domains of SES: income and education, insurance and poverty, and unemployment. At least one factor score was available for 34,528 participants and two factors were available for more participants than any single indicator. Factors fit the data well, had content validity, and were correlated with alternative measures of SES (for income and education factor, r = 0.40-0.89). Higher SES as measured by the factor scores was associated with lower odds of prenatal smoking: odds ratio income and education : 0.42 (95% confidence interval: 0.38, 0.45). Missing data were reduced compared with most methods, except for multiple imputation.
ROSETTA aids in pooled analysis of individual participant data by creating measures on a common scale and maximizing data in the presence of missing and mismatched measures.
合作研究联盟提供了一种增加样本量的有效方法,能够评估亚组异质性和罕见结局。除了所有队列研究都面临的如无应答和失访等数据缺失挑战外,合作研究还因参与研究的设计和测量差异而存在数据缺失问题。
我们扩展了ROSETTA,这是一种潜在变量方法,可在收集相同潜在结构但测量指标仅有部分重叠的数据集之间创建共同测量指标,以在环境对儿童健康结局队列(一个由妊娠和儿科队列组成的联盟)中,为具有不同指标的队列定义社会经济地位(SES)的共同测量指标。
从53个队列中39372名参与者的52个产前SES指标开始,ROSETTA创建了代表SES关键领域的三个因素:收入与教育、保险与贫困以及失业。至少有一个因素得分的参与者有34528名,有两个因素得分的参与者比任何单一指标的都多。这些因素与数据拟合良好,具有内容效度,并且与SES的替代测量指标相关(对于收入与教育因素,r = 0.40 - 0.89)。通过因素得分衡量得出的较高SES与较低的产前吸烟几率相关:收入与教育因素的优势比为0.42(95%置信区间:0.38,0.45)。与大多数方法相比,除多重填补外,数据缺失情况有所减少。
ROSETTA通过在共同尺度上创建测量指标并在存在缺失和不匹配测量指标的情况下最大化数据,有助于对个体参与者数据进行汇总分析。