Precision Health Informatics Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, United States.
Department of Health and Society, University of Toronto, Scarborough, Toronto, ON, Canada.
J Am Med Inform Assoc. 2023 Dec 22;31(1):139-153. doi: 10.1093/jamia/ocad205.
The All of Us Research Program (All of Us) aims to recruit over a million participants to further precision medicine. Essential to the verification of biobanks is a replication of known associations to establish validity. Here, we evaluated how well All of Us data replicated known cigarette smoking associations.
We defined smoking exposure as follows: (1) an EHR Smoking exposure that used International Classification of Disease codes; (2) participant provided information (PPI) Ever Smoking; and, (3) PPI Current Smoking, both from the lifestyle survey. We performed a phenome-wide association study (PheWAS) for each smoking exposure measurement type. For each, we compared the effect sizes derived from the PheWAS to published meta-analyses that studied cigarette smoking from PubMed. We defined two levels of replication of meta-analyses: (1) nominally replicated: which required agreement of direction of effect size, and (2) fully replicated: which required overlap of confidence intervals.
PheWASes with EHR Smoking, PPI Ever Smoking, and PPI Current Smoking revealed 736, 492, and 639 phenome-wide significant associations, respectively. We identified 165 meta-analyses representing 99 distinct phenotypes that could be matched to EHR phenotypes. At P < .05, 74 were nominally replicated and 55 were fully replicated. At P < 2.68 × 10-5 (Bonferroni threshold), 58 were nominally replicated and 40 were fully replicated.
Most phenotypes found in published meta-analyses associated with smoking were nominally replicated in All of Us. Both survey and EHR definitions for smoking produced similar results.
This study demonstrated the feasibility of studying common exposures using All of Us data.
All of Us 研究计划(All of Us)旨在招募超过一百万名参与者,以进一步推进精准医学。生物库的验证至关重要,需要复制已知关联以建立有效性。在这里,我们评估了 All of Us 数据复制已知吸烟关联的效果。
我们将吸烟暴露定义为以下三种情况:(1)电子健康记录(EHR)中使用国际疾病分类代码的吸烟暴露;(2)来自生活方式调查的参与者提供的信息(PPI)“是否吸烟”;(3)PPI 当前吸烟。我们对每种吸烟暴露测量类型进行了全基因组关联研究(PheWAS)。对于每种情况,我们将从 PheWAS 中得出的效应大小与来自 PubMed 的研究香烟吸烟的已发表荟萃分析进行比较。我们定义了荟萃分析复制的两个水平:(1)名义复制:需要效应大小方向的一致;(2)完全复制:需要置信区间的重叠。
使用 EHR 吸烟、PPI 过去吸烟和 PPI 当前吸烟进行的 PheWAS 分别揭示了 736、492 和 639 个全基因组显著关联。我们确定了 165 项荟萃分析,代表 99 个不同的表型,可以与 EHR 表型匹配。在 P <.05 时,有 74 个是名义复制,55 个是完全复制。在 P < 2.68 × 10-5(Bonferroni 阈值)时,有 58 个是名义复制,40 个是完全复制。
大多数在已发表荟萃分析中发现的与吸烟相关的表型在 All of Us 中被名义复制。吸烟的调查和 EHR 定义都产生了相似的结果。
这项研究表明使用 All of Us 数据研究常见暴露是可行的。