Namjou Bahram, Marsolo Keith, Caroll Robert J, Denny Joshua C, Ritchie Marylyn D, Verma Shefali S, Lingren Todd, Porollo Aleksey, Cobb Beth L, Perry Cassandra, Kottyan Leah C, Rothenberg Marc E, Thompson Susan D, Holm Ingrid A, Kohane Isaac S, Harley John B
Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA.
College of Medicine, University of Cincinnati Cincinnati, OH, USA ; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA.
Front Genet. 2014 Nov 18;5:401. doi: 10.3389/fgene.2014.00401. eCollection 2014.
We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes.
Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented.
This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10(-7), OR = 1.70, 95%CI = 1.38 - 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10(-8), OR = 0.65(0.57 - 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10(-9), OR = 1.73 95%CI = (1.44 - 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10(-5), OR = 1.39, 95%CI = (1.19 - 1.61)], previously demonstrated in metabolic disease and diabetes in adults.
The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications.
我们报告了首例使用电子病历(EMR)进行的儿科特异性全表型关联研究(PheWAS)。鉴于PheWAS在成年人群中取得的初步成功,我们调查了这种方法在儿科队列中的可行性,在该队列中评估了先前已知的基因变异与广泛的临床或生理特征之间的关联。尽管这种方法计算量很大,但它有潜力揭示变异与一系列表型之间的疾病机制关系。
从五个不同基因分型队列的两个大型学术中心的电子病历中获取了5049个欧洲血统样本的数据。最近,这些样本进行了全基因组填充。经过标准质量控制,基于主成分分析(PCA)去除缺失数据和异常值后,4268个样本用于PheWAS研究。我们扫描了2476个单核苷酸多态性(SNP)与先前发表的全基因组关联研究(GWAS)中的可用基因分型数据以及539个来自电子病历的表型之间的关联。计算了错误发现率,对于任何新的PheWAS发现,采用了一种置换方法(最多进行1,000,000次试验)。
该PheWAS在我们的儿科队列中发现了多种先前GWAS关联的常见变异(MAF>10%),包括幼年类风湿性关节炎(JRA)、哮喘、自闭症和广泛性发育障碍(PDD)以及1型糖尿病,错误发现率<0.05,研究效能高于80%。此外,还确定了几个新的PheWAS发现,包括NDFIP1基因附近与智力迟钝相关的一组关联(最佳SNP rs10057309,p = 4.33×10⁻⁷,OR = 1.70,95%CI = 1.38 - 2.09);PLCL1基因附近与发育迟缓及言语障碍相关的关联[最佳SNP rs1595825,p = 1.13×10⁻⁸,OR = 0.65(0.57 - 0.76)];IL5 - IL13区域与嗜酸性食管炎(EoE)相关的一组关联[最佳为rs12653750,p = 3.03×10⁻⁹,OR = 1.73,95%CI = (1.44 - 2.07)],该区域先前与哮喘、过敏和嗜酸性粒细胞增多有关;以及我们儿科队列中GCKR和JAZF1基因变异与过敏性鼻炎的关联[最佳SNP rs780093,p = 2.18×10⁻⁵,OR = 1.39,95%CI = (1.19 - 1.61)],先前在成人代谢疾病和糖尿病中已有报道。
与先前的成人研究一样,对我们欧洲血统的儿科队列采用重新映射ICD - 9结构化代码的PheWAS方法,发现了许多先前报道的关联,同时也发现了具有潜在重要临床意义的关联。