Mei Hao, Li Lianna, Liu Shijian, Jiang Fan, Griswold Michael, Mosley Thomas
Center of Biostatistics & Bioinformatics, University of Mississippi Medical Center, Jackson, MS, USA.
Shanghai Children's Medical Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
BMC Genomics. 2015 Apr 23;16(1):336. doi: 10.1186/s12864-015-1515-3.
Genetic heritability and expression study have shown that different diabetes traits have common genetic components and pathways. A computationally efficient pathway analysis of GWAS results will benefit post-GWAS study of SNP associations and identification of common genetic pathways from diabetes GWAS can help to improve understanding of the disease pathogenesis.
We proposed a uniform-score gene-set analysis (USGSA) with implemented package to unify different gene measures by a uniform score for identifying pathways from GWAS data, and use a pre-generated permutation distribution table to quickly obtain multiple-testing adjusted p-value. Simulation studies of uniform score for four gene measures (minP, 2ndP, simP and fishP) have shown that USGSA has strictly controlled family-wise error rate. The power depends on types of gene measure. USGSA with a two-stage study strategy was applied to identify common pathways associated with diabetes traits based on public dbGaP GWAS results. The study identified 7 gene sets that contain binding motifs at promoter region of component genes for 5 transcription factors (TFs) of FOXO4, TCF3, NFAT, VSX1 and POU2F1, and 1 microRNA of mir-218. These gene sets include 25 common genes that are among top 5% of the gene associations over genome for all GWAS. Previous evidences showed that nearly all of these genes are mainly expressed in the brain.
USGSA is a computationally efficient approach for pathway analysis of GWAS data with promoted interpretability and comparability. The pathway analysis suggested that different diabetes traits share common pathways and component genes are potentially regulated by common TFs and microRNA. The result also indicated that the central nervous system has a critical role in diabetes pathogenesis. The findings will be important in formulating novel hypotheses for guiding follow-up studies.
遗传遗传性和表达研究表明,不同的糖尿病特征具有共同的遗传成分和途径。对全基因组关联研究(GWAS)结果进行高效的通路分析将有助于GWAS后单核苷酸多态性(SNP)关联研究,并且从糖尿病GWAS中识别常见的遗传途径有助于增进对疾病发病机制的理解。
我们提出了一种统一评分基因集分析方法(USGSA)并实现了相关软件包,通过统一评分来统一不同的基因测量方法,以便从GWAS数据中识别通路,并使用预先生成的置换分布表快速获得多重检验校正的P值。对四种基因测量方法(最小P值、第二小P值、相似性P值和F检验P值)的统一评分进行的模拟研究表明,USGSA严格控制了家族性错误率。功效取决于基因测量方法的类型。基于公共dbGaP GWAS结果,采用两阶段研究策略的USGSA被用于识别与糖尿病特征相关的常见途径。该研究确定了7个基因集,这些基因集在FOXO4、TCF3、NFAT、VSX1和POU2F1这5种转录因子(TFs)以及mir-218这1种微小RNA的组成基因的启动子区域包含结合基序。这些基因集包括25个常见基因,在所有GWAS中,这些基因在全基因组基因关联中位于前5%。先前的证据表明,几乎所有这些基因主要在大脑中表达。
USGSA是一种用于GWAS数据通路分析的计算高效方法,具有更高的可解释性和可比性。通路分析表明,不同的糖尿病特征共享共同的途径,并且组成基因可能受共同的转录因子和微小RNA调控。结果还表明,中枢神经系统在糖尿病发病机制中起关键作用。这些发现对于提出新的假设以指导后续研究具有重要意义。