Li Haiquan, Fan Jungwei, Vitali Francesca, Berghout Joanne, Aberasturi Dillon, Li Jianrong, Wilson Liam, Chiu Wesley, Pumarejo Minsu, Han Jiali, Kenost Colleen, Koripella Pradeep C, Pouladi Nima, Billheimer Dean, Bedrick Edward J, Lussier Yves A
Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.
Department of Medicine at the College of Medicine-Tucson, The University of Arizona, Tucson, AZ, 85721, USA.
BMC Med Genomics. 2018 Dec 31;11(Suppl 6):112. doi: 10.1186/s12920-018-0428-9.
Forty-two percent of patients experience disease comorbidity, contributing substantially to mortality rates and increased healthcare costs. Yet, the possibility of underlying shared mechanisms for diseases remains not well established, and few studies have confirmed their molecular predictions with clinical datasets.
In this work, we integrated genome-wide association study (GWAS) associating diseases and single nucleotide polymorphisms (SNPs) with transcript regulatory activity from expression quantitative trait loci (eQTL). This allowed novel mechanistic insights for noncoding and intergenic regions. We then analyzed pairs of SNPs across diseases to identify shared molecular effectors robust to multiple test correction (False Discovery Rate FDR < 0.05). We hypothesized that disease pairs found to be molecularly convergent would also be significantly overrepresented among comorbidities in clinical datasets. To assess our hypothesis, we used clinical claims datasets from the Healthcare Cost and Utilization Project (HCUP) and calculated significant disease comorbidities (FDR < 0.05). We finally verified if disease pairs resulting molecularly convergent were also statistically comorbid more than by chance using the Fisher's Exact Test.
Our approach integrates: (i) 6175 SNPs associated with 238 diseases from ~ 1000 GWAS, (ii) eQTL associations from 19 tissues, and (iii) claims data for 35 million patients from HCUP. Logistic regression (controlled for age, gender, and race) identified comorbidities in HCUP, while enrichment analyses identified cis- and trans-eQTL downstream effectors of GWAS-identified variants. Among ~ 16,000 combinations of diseases, 398 disease-pairs were prioritized by both convergent eQTL-genetics (RNA overlap enrichment, FDR < 0.05) and clinical comorbidities (OR > 1.5, FDR < 0.05). Case studies of comorbidities illustrate specific convergent noncoding regulatory elements. An intergenic architecture of disease comorbidity was unveiled due to GWAS and eQTL-derived convergent mechanisms between distinct diseases being overrepresented among observed comorbidities in clinical datasets (OR = 8.6, p-value = 6.4 × 10 FET).
These comorbid diseases with convergent eQTL genetic mechanisms suggest clinical syndromes. While it took over a decade to confirm the genetic underpinning of the metabolic syndrome, this study is likely highlighting hundreds of new ones. Further, this knowledge may improve the clinical management of comorbidities with precision and shed light on novel approaches of drug repositioning or SNP-guided precision molecular therapy inclusive of intergenic risks.
42%的患者存在疾病共病现象,这在很大程度上导致了死亡率上升和医疗成本增加。然而,疾病潜在的共同机制尚未得到充分证实,很少有研究用临床数据集验证其分子预测。
在本研究中,我们整合了全基因组关联研究(GWAS),将疾病和单核苷酸多态性(SNP)与来自表达数量性状基因座(eQTL)的转录调控活性相关联。这为非编码和基因间区域提供了新的机制性见解。然后,我们分析了不同疾病间的SNP对,以识别在多重检验校正后仍稳健的共同分子效应因子(错误发现率FDR<0.05)。我们假设,在分子层面上具有趋同性的疾病对在临床数据集中的共病情况中也会显著富集。为了验证我们的假设,我们使用了医疗成本和利用项目(HCUP)的临床索赔数据集,并计算了显著的疾病共病情况(FDR<0.05)。最后,我们使用Fisher精确检验验证了在分子层面上具有趋同性的疾病对在统计学上是否也比偶然情况更易共病。
我们的方法整合了:(i)来自约1000项GWAS的与238种疾病相关的6175个SNP,(ii)来自19种组织的eQTL关联,以及(iii)来自HCUP的3500万患者的索赔数据。逻辑回归(控制年龄、性别和种族)确定了HCUP中的共病情况,而富集分析确定了GWAS识别变异的顺式和反式eQTL下游效应因子。在约16000种疾病组合中,398对疾病通过趋同的eQTL遗传学(RNA重叠富集,FDR<0.05)和临床共病情况(OR>1.5,FDR<0.05)被优先筛选出来。共病情况的案例研究说明了特定的趋同非编码调控元件。由于GWAS和eQTL衍生的不同疾病间的趋同机制在临床数据集中观察到的共病情况中过度富集,揭示了疾病共病的基因间结构(OR=8.6,p值=6.4×10 FET)。
这些具有趋同eQTL遗传机制的共病疾病提示了临床综合征。虽然花了十多年才证实代谢综合征的遗传基础,但本研究可能凸显了数百种新的综合征。此外,这些知识可能会提高共病的精准临床管理,并为药物重新定位或包括基因间风险的SNP导向精准分子治疗的新方法提供启示。