Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, 151-747, Korea.
Department of Computer Science, Technion - Israel Institute of Technology, Haifa, 3200003, Israel.
Sci Rep. 2017 Sep 4;7(1):8416. doi: 10.1038/s41598-017-08468-y.
Functional rare variants in drug-related genes are believed to be highly differentiated between ethnic- or racial populations. However, knowledge of population differentiation (PD) of rare single-nucleotide variants (SNVs), remains widely lacking, with the highest fixation indices, (F values), from both rare and common variants annotated to specific genes, having only been marginally used to understand PD at the gene level. In this study, we suggest a new, gene-based PD method, PD of Rare and Common variants (PDRC), for analyzing rare variants, as inspired by Generalized Cochran-Mantel-Haenszel (GCMH) statistics, to identify highly population-differentiated drug response-related genes ("pharmacogenes"). Through simulation studies, we reveal that PDRC adequately summarizes rare and common variants, due to PD, over a specific gene. We also applied the proposed method to a real whole-exome sequencing dataset, consisting of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and 3,000 datasets from the Genetics of Type 2 diabetes (Go-T2D) repository. Among the 48 genes annotated with Very Important Pharmacogenetic summaries (VIPgenes), in the PharmGKB database, our PD method successfully identified candidate genes with high PD, including ACE, CYP2B6, DPYD, F5, MTHFR, and SCN5A.
功能罕见变异体在药物相关基因中被认为在不同种族或人群之间存在高度分化。然而,对于罕见单核苷酸变异体(SNVs)的群体分化(PD)知识仍然广泛缺乏,注释到特定基因的罕见和常见变异体的最高固定指数(F 值)仅被轻微用于理解基因水平的 PD。在这项研究中,我们受广义 Cochran-Mantel-Haenszel(GCMH)统计数据的启发,提出了一种新的基于基因的 PD 方法,即罕见和常见变异体的 PD(PDRC),用于分析罕见变异体,以识别高度分化的药物反应相关基因(“药物基因”)。通过模拟研究,我们揭示了 PDRC 由于 PD 可以充分总结特定基因中的罕见和常见变异体。我们还将所提出的方法应用于一个由 10000 个数据集组成的真实全外显子测序数据集,这些数据集来自 2 型糖尿病遗传探索通过下一代测序在多民族样本(T2D-GENES)倡议和 3000 个来自 2 型糖尿病遗传学(Go-T2D)存储库的数据集。在 PharmGKB 数据库中注释的 48 个具有非常重要药物遗传学摘要(VIPgenes)的基因中,我们的 PD 方法成功地确定了具有高 PD 的候选基因,包括 ACE、CYP2B6、DPYD、F5、MTHFR 和 SCN5A。