Wei Wei-Qi, Bastarache Lisa A, Carroll Robert J, Marlo Joy E, Osterman Travis J, Gamazon Eric R, Cox Nancy J, Roden Dan M, Denny Joshua C
Departments of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America.
Departments of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America.
PLoS One. 2017 Jul 7;12(7):e0175508. doi: 10.1371/journal.pone.0175508. eCollection 2017.
To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs.
We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs.
Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage.
Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.
比较电子健康记录(EHR)计费代码的三种分组在表示具有临床意义的表型以及复制已知基因关联方面的能力。所测试的三种编码系统分别是《国际疾病分类》第九版临床修订本(ICD-9-CM)代码、医疗保健研究与质量机构的ICD-9-CM临床分类软件(CCS)以及为便于在EHR中进行全表型关联研究(PheWAS)而手动整理的“phecode”。
我们选择了100种疾病表型,并比较了每种编码系统在不进行额外分组的情况下准确表示它们的能力。这100种表型包括先前全基因组关联研究(GWAS)中随机选择的25种临床表型,以及来自189289名个体的自由文本问题列表中提到的另外75种常见疾病表型。然后,我们评估了每种编码系统复制440个单核苷酸多态性-表型对已知关联的性能。
在100种测试的临床表型中,phecode与83种完全匹配,而ICD-9-CM为53种,CCS为32种。ICD-9-CM代码通常过于详细(需要自定义分组),而CCS代码往往不够细化。在440个测试的已知单核苷酸多态性-表型关联中,使用phecode复制了153个单核苷酸多态性-表型对,而ICD-9-CM为143个,CCS为139个。对于已知关联,phecode通常也比ICD-9-CM和CCS产生更强的优势比和更低的p值。最后,通过PheWAS对几个单核苷酸多态性的评估确定了新的潜在信号,其中一些仅在使用phecode方法时可见。其中,PEPD中的rs7318369与胃肠道出血相关。
我们的结果表明,phecode分组与临床实践或基因组研究中提到的临床疾病更相符。ICD-9-CM、CCS和phecode分组在PheWAS类型的研究中均有效,不过phecode分组产生了更优的结果。