Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA.
Bioinformatics. 2015 Jun 15;31(12):1981-7. doi: 10.1093/bioinformatics/btv076. Epub 2015 Feb 4.
Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery.
As an alternative to ICD9 coding, a text-based phenome was defined by 23 384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.
全基因组关联研究(GWAS)对于描述常见疾病的遗传复杂性非常有效。表型全基因组关联研究(PheWAS)提供了一种替代和补充的方法,利用电子健康记录(EHR)中嵌入的数据来定义表型。国际疾病分类第 9 版(ICD9)代码常用于定义表型,但仅使用 ICD9 代码会错过 EHR 中其他与临床相关的信息,这些信息可用于 PheWAS 分析和发现。
作为 ICD9 编码的替代方法,通过从 Marshfield 诊所的 EHR 中提取的 23384 个临床相关术语定义了基于文本的表型。在 4235 个人中对 5 个具有已知表型关联的单核苷酸多态性(SNP)进行了基因分型,并在基于文本的表型中进行了关联。对所有 5 个进行基因分型的 SNP 都与预期的术语相关(P<0.02),大多数 SNP 位于或接近各自 PheWAS 排名的顶端。原始关联结果表明,文本数据与 ICD9 编码等效,并证明了超越 ICD9 编码的信息对于在 PheWAS 中的应用的实用性。