Suppr超能文献

基线变量缺失的重要性:以“所有人”研究计划为例。

Importance of missingness in baseline variables: A case study of the All of Us Research Program.

机构信息

Department of Internal Medicine, The Ohio State University, Columbus, Ohio, United States of America.

Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.

出版信息

PLoS One. 2023 May 18;18(5):e0285848. doi: 10.1371/journal.pone.0285848. eCollection 2023.

Abstract

OBJECTIVE

The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe missingness in All of Us baseline surveys.

STUDY DESIGN AND SETTING

We extracted survey responses between May 31, 2017, to September 30, 2020. Missing percentages for groups historically underrepresented in biomedical research were compared to represented groups. Associations of missing percentages with age, health literacy score, and survey completion date were evaluated. We used negative binomial regression to evaluate participant characteristics on the number of missed questions out of the total eligible questions for each participant.

RESULTS

The dataset analyzed contained data for 334,183 participants who submitted at least one baseline survey. Almost all (97.0%) of the participants completed all baseline surveys, and only 541 (0.2%) participants skipped all questions in at least one of the baseline surveys. The median skip rate was 5.0% of the questions, with an interquartile range (IQR) of 2.5% to 7.9%. Historically underrepresented groups were associated with higher missingness (incidence rate ratio (IRR) [95% CI]: 1.26 [1.25, 1.27] for Black/African American compared to White). Missing percentages were similar by survey completion date, participant age, and health literacy score. Skipping specific questions were associated with higher missingness (IRRs [95% CI]: 1.39 [1.38, 1.40] for skipping income, 1.92 [1.89, 1.95] for skipping education, 2.19 [2.09-2.30] for skipping sexual and gender questions).

CONCLUSION

Surveys in the All of Us Research Program will form an essential component of the data researchers can use to perform their analyses. Missingness was low in All of Us baseline surveys, but group differences exist. Additional statistical methods and careful analysis of surveys could help mitigate challenges to the validity of conclusions.

摘要

目的

“所有人”研究计划从多个信息来源(包括健康调查)收集数据,以构建一个国家纵向研究存储库,研究人员可以使用该存储库推进精准医学。调查回复缺失给研究结论带来了挑战。我们描述了“所有人”基线调查中的缺失情况。

研究设计和环境

我们提取了 2017 年 5 月 31 日至 2020 年 9 月 30 日之间的调查回复。比较了生物医学研究中代表性不足的群体与代表性群体的缺失百分比。评估了缺失百分比与年龄、健康素养得分和调查完成日期的关联。我们使用负二项回归评估参与者特征对每个参与者的总合格问题中错过问题的数量的影响。

结果

分析的数据集包含了至少提交了一份基线调查的 334183 名参与者的数据。几乎所有(97.0%)参与者都完成了所有的基线调查,只有 541 名(0.2%)参与者在至少一份基线调查中跳过了所有问题。中位数跳过率为问题的 5.0%,四分位距(IQR)为 2.5%至 7.9%。代表性不足的群体与较高的缺失率相关(发病率比(IRR)[95%CI]:与白人相比,黑人/非裔美国人 1.26[1.25,1.27])。根据调查完成日期、参与者年龄和健康素养得分,缺失百分比相似。跳过特定问题与较高的缺失率相关(跳过收入的 IRR[95%CI]:1.39[1.38,1.40],跳过教育的 IRR[95%CI]:1.92[1.89,1.95],跳过性和性别问题的 IRR[95%CI]:2.19[2.09-2.30])。

结论

“所有人”研究计划中的调查将成为研究人员进行分析的重要数据组成部分。“所有人”基线调查中的缺失率较低,但存在群体差异。额外的统计方法和对调查的仔细分析可以帮助减轻对结论有效性的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bf/10194909/27fc9b6512ae/pone.0285848.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验