Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
Bioinformatics. 2010 May 1;26(9):1205-10. doi: 10.1093/bioinformatics/btq126. Epub 2010 Mar 24.
MOTIVATION: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. RESULTS: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8 x 10(-6) and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. AVAILABILITY: The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.
动机:遗传数据的出现加上纵向电子病历 (EMR) 为疾病-基因关联的全表型关联扫描 (PheWAS) 提供了可能。我们提出了一种使用国际疾病分类 (ICD9) 计费代码扫描表型数据中的遗传关联的新方法,该方法可在大多数 EMR 系统中使用。我们已经开发了一个代码翻译表,使用从 EMR 数据中得出的常见 ICD9 代码自动定义 776 种不同的疾病人群及其对照。作为该算法的概念验证,我们在五个单核苷酸多态性 (SNP) 上对前 6005 名累积到 Vanderbilt 的 DNA 生物库 BioVU 的欧洲裔美国人进行了基因分型,这些 SNP 具有先前报道的疾病关联:心房颤动、克罗恩病、颈动脉狭窄、冠心病、多发性硬化症、系统性红斑狼疮和类风湿性关节炎。PheWAS 软件为这五个 SNP 中的每一个生成了所有 ICD9 代码组的病例和对照人群,并分析了疾病-SNP 关联。本研究的主要结果是复制了这七个 SNP 与七种先前已知的 SNP-疾病关联。
结果:使用 PheWAS 算法,七个已知 SNP-疾病关联中的四个得到了复制,P 值介于 2.8 x 10(-6) 和 0.011 之间。PheWAS 算法还在这些 SNP 与疾病之间确定了 19 个以前未知的统计学关联,P 值小于 0.01。本研究表明,PheWAS 分析是一种可行的方法,可以研究 SNP-疾病关联。需要进一步评估这些关联的有效性和用于临床意义的适当统计阈值。
可用性:PheWAS 软件和代码翻译表可在 http://knowledgemap.mc.vanderbilt.edu/research 上免费获得。
Bioinformatics. 2010-3-24
Bioinformatics. 2015-2-4
Arthritis Care Res (Hoboken). 2018-11
Pac Symp Biocomput. 2011
Nat Commun. 2025-7-1
medRxiv. 2025-6-12
Commun Med (Lond). 2025-6-20
JAMA Intern Med. 2025-6-16
Per Med. 2007-5
Proc Natl Acad Sci U S A. 2009-6-9
Neuroscience. 2009-1-20
Arterioscler Thromb Vasc Biol. 2009-5