Smith Louisa H, Wang Wanjiang, Keefe-Oates Brianna
Department of Public Health and Health Sciences, Bouvé College of Health Sciences, Northeastern University, Boston, MA 02115, United States.
Roux Institute, Northeastern University, Portland, ME 04101, United States.
J Am Med Inform Assoc. 2024 Dec 1;31(12):2789-2799. doi: 10.1093/jamia/ocae195.
The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.
We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.
Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.
Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.
The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.
美国国立卫生研究院的“我们所有人”研究项目通过收集不同人群的健康数据来填补生物医学研究的空白。历史上,孕妇在生物医学研究中的代表性不足,与妊娠相关的研究往往受到数据可用性、样本量以及孕妇多样性代表性不足的限制。“我们所有人”整合了大量与健康相关的数据,为开展全面的妊娠相关研究提供了独特的机会。我们旨在在“我们所有人”研究项目数据中识别具有高质量电子健康记录(EHR)数据的妊娠事件,并评估该项目在妊娠相关研究中的效用。
我们使用先前发表的算法在“我们所有人”的EHR数据中识别妊娠事件。我们描述了这些妊娠情况,用“我们所有人”的调查数据对其进行验证,并与国家统计数据进行比较。
我们的研究从14234名参与者中识别出18970例妊娠事件;其他可能的妊娠事件数据质量低或不充分。与在“我们所有人”调查中报告当前怀孕的人进行验证,发现假阳性和假阴性率较低。在某些方面,人口统计学特征与国家数据相似;然而,亚裔美国人的代表性不足,年龄较大、受过高等教育的孕妇代表性过高。
我们的方法证明了“我们所有人”支持妊娠研究的能力,并揭示了妊娠队列的多样性。然而,我们注意到某些人口统计学群体的代表性不足。其他局限性包括孕周测量误差和非活产数据有限。
“我们所有人”项目中的各种数据,包括EHR、调查、基因组和健身追踪器数据,为研究妊娠提供了宝贵的资源,但必须注意避免偏差。