基于电子健康记录的基因关联研究中的表型验证

Phenotype validation in electronic health records based genetic association studies.

作者信息

Wang Lu, Damrauer Scott M, Zhang Hong, Zhang Alan X, Xiao Rui, Moore Jason H, Chen Jinbo

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

出版信息

Genet Epidemiol. 2017 Dec;41(8):790-800. doi: 10.1002/gepi.22080. Epub 2017 Oct 11.

DOI:10.1002/gepi.22080

PMID:29023970

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5891135/

Abstract

The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.

摘要

电子健康记录（EHR）与基因型数据之间的联系使得研究广泛疾病表型的遗传易感性成为可能。尽管源自EHR的表型数据存在错误分类，但已证明其对于发现易感基因很有用，尤其是在全表型组关联研究（PheWAS）中。通过图表审查使用金标准表型数据来表征发现的关联至关重要。在这项工作中，我们提出了一种基因型分层病例对照抽样策略，以选择用于表型验证的受试者。我们针对比值比参数开发了一种封闭式最大似然估计器，并使用经过验证的和容易出错的源自EHR的表型数据组合来开发用于检验基因关联的得分统计量，并评估此方法提供的功效提高程度。与仅基于源自EHR的表型数据进行病例对照抽样相比，我们的基因型分层策略保持了名义上的I型错误率，并在检测关联时具有更高的功效。它还纠正了比值比参数估计中的偏差，并减小了相应的方差，尤其是在次要等位基因频率较小时。

相似文献

Phenotype validation in electronic health records based genetic association studies.基于电子健康记录的基因关联研究中的表型验证

Genet Epidemiol. 2017 Dec;41(8):790-800. doi: 10.1002/gepi.22080. Epub 2017 Oct 11.

INTEGRATING CLINICAL LABORATORY MEASURES AND ICD-9 CODE DIAGNOSES IN PHENOME-WIDE ASSOCIATION STUDIES.在全表型关联研究中整合临床实验室检测指标与ICD - 9编码诊断信息

Pac Symp Biocomput. 2016;21:168-79.

An augmented estimation procedure for EHR-based association studies accounting for differential misclassification.基于电子健康记录的关联研究的增强估计程序，考虑到差异误诊。

J Am Med Inform Assoc. 2020 Feb 1;27(2):244-253. doi: 10.1093/jamia/ocz180.

Contrasting Association Results between Existing PheWAS Phenotype Definition Methods and Five Validated Electronic Phenotypes.现有全表型组关联研究（PheWAS）表型定义方法与五种经过验证的电子表型之间的对比关联结果。

AMIA Annu Symp Proc. 2015 Nov 5;2015:824-32. eCollection 2015.

Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study.平衡图表审查工作与 PRS 预测准确性的提高：一项实证研究。

J Biomed Inform. 2024 Sep;157:104705. doi: 10.1016/j.jbi.2024.104705. Epub 2024 Aug 10.

A simulation study investigating power estimates in phenome-wide association studies.一项针对表型全基因组关联研究中功效估计的模拟研究。

BMC Bioinformatics. 2018 Apr 4;19(1):120. doi: 10.1186/s12859-018-2135-0.

A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data.一种具有成本效益的图表审查抽样设计，用于解决电子健康记录 (EHR) 数据中的表型错误。

J Am Med Inform Assoc. 2021 Dec 28;29(1):52-61. doi: 10.1093/jamia/ocab222.

An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records.一种用于利用电子健康记录探索全基因组和全表型组关联研究中的抽样和观察过程偏差的分析框架。

Stat Med. 2020 Jun 30;39(14):1965-1979. doi: 10.1002/sim.8524. Epub 2020 Mar 20.

Developing and evaluating pediatric phecodes (Peds-Phecodes) for high-throughput phenotyping using electronic health records.开发和评估基于电子健康记录的高通量表型分析的儿科 phecode（Peds-Phecodes）。

J Am Med Inform Assoc. 2024 Jan 18;31(2):386-395. doi: 10.1093/jamia/ocad233.

Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm.利用自然语言处理增强表型算法的电子健康记录中憩室病的多祖先基因组和表型全基因组关联研究。

PLoS One. 2023 May 17;18(5):e0283553. doi: 10.1371/journal.pone.0283553. eCollection 2023.

引用本文的文献

Optimal multiwave validation of secondary use data with outcome and exposure misclassification.对存在结局和暴露错误分类的二次利用数据进行最优多波验证。

Can J Stat. 2024 Jun;52(2):532-554. doi: 10.1002/cjs.11772. Epub 2023 Mar 31.

Clinical diagnoses associated with a positive antinuclear antibody test in patients with and without autoimmune disease.自身免疫性疾病患者和非自身免疫性疾病患者中抗核抗体检测呈阳性相关的临床诊断。

BMC Rheumatol. 2023 Aug 7;7(1):24. doi: 10.1186/s41927-023-00349-4.

Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.人类免疫缺陷病毒（HIV）队列和电子健康记录数据中多变量的错误：统计挑战与机遇

Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.

Pleiotropy in the Genetic Predisposition to Rheumatoid Arthritis: A Phenome-Wide Association Study and Inverse Variance-Weighted Meta-Analysis.遗传易感性与类风湿关节炎的多效性：表型全基因组关联研究和逆方差加权荟萃分析。

Arthritis Rheumatol. 2020 Sep;72(9):1483-1492. doi: 10.1002/art.41291. Epub 2020 Aug 6.

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities.基于与电子健康记录相关联的生物银行的健康研究的新兴领域：现有资源、统计挑战和潜在机会。

Stat Med. 2020 Mar 15;39(6):773-800. doi: 10.1002/sim.8445. Epub 2019 Dec 20.

本文引用的文献

Phenome-Wide Association Study of Autoantibodies to Citrullinated and Noncitrullinated Epitopes in Rheumatoid Arthritis.类风湿关节炎中瓜氨酸化和非瓜氨酸化表位自身抗体的表型全基因组关联研究。

Arthritis Rheumatol. 2017 Apr;69(4):742-749. doi: 10.1002/art.39974.

Phenome-Wide Association Studies as a Tool to Advance Precision Medicine.全表型组关联研究作为推进精准医学的工具

Annu Rev Genomics Hum Genet. 2016 Aug 31;17:353-73. doi: 10.1146/annurev-genom-090314-024956. Epub 2016 May 4.

Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records.利用源自电子病历的不完美表型提高基因关联测试的效能。

Hum Genet. 2014 Nov;133(11):1369-82. doi: 10.1007/s00439-014-1466-9. Epub 2014 Jul 26.

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.系统比较电子病历数据的表型全基因组关联研究和全基因组关联研究数据。

Nat Biotechnol. 2013 Dec;31(12):1102-10. doi: 10.1038/nbt.2749.

A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects.一项全基因组和表型全基因组关联研究，旨在确定影响血小板计数和体积的遗传变异及其多效性效应。

Hum Genet. 2014 Jan;133(1):95-109. doi: 10.1007/s00439-013-1355-7. Epub 2013 Sep 12.

A unified framework for association analysis with multiple related phenotypes.一种用于分析多个相关表型关联的统一框架。

PLoS One. 2013 Jul 5;8(7):e65245. doi: 10.1371/journal.pone.0065245. Print 2013.

Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records.通过使用源自电子病历的银标准病例来增强遗传关联研究的效力。

PLoS One. 2013 Jun 10;8(6):e63481. doi: 10.1371/journal.pone.0063481. Print 2013.

Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk.基因组和表型全基因组分析发现心脏传导标志物与心律失常风险相关。

Circulation. 2013 Apr 2;127(13):1377-85. doi: 10.1161/CIRCULATIONAHA.112.000604. Epub 2013 Mar 5.

TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies.TATES：用于全基因组关联研究的高效多变量基因型-表型分析。

PLoS Genet. 2013;9(1):e1003235. doi: 10.1371/journal.pgen.1003235. Epub 2013 Jan 24.

Chapter 13: Mining electronic health records in the genomics era.第十三章：基因组时代的电子健康记录挖掘。

PLoS Comput Biol. 2012;8(12):e1002823. doi: 10.1371/journal.pcbi.1002823. Epub 2012 Dec 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。