Park Sung Hee, Kim Sangsoo
Department of Bioinformatics and Life Sciences, Soongsil University, Seoul 156-743, South Korea.
Int J Data Min Bioinform. 2012;6(5):505-20.
Genome-wide association studies (GWAS) have served crucial roles in investigating disease susceptible loci for single traits. On the other hand, GWAS have been limited in measuring genetic risk factors for multivariate phenotypes from pleiotropic genetic effects of genetic loci. This work reports a data mining approach to discover patterns of multivariate phenotypes expressed as association rules, and presents an analytical scheme for GWAS of those newly defined multivariate phenotypes. We identified 13 SNPs for four genes (CSMD1, NFE2L1, CBX1, and SKAP1) associated with a new multivariate phenotype defined as low levels of low density lipoprotein cholesterol (LDL-C < or = 100 mg/dl) and high levels of triglycerides (TG > or = 180 mg/dl). Compared with a traditional approach to GWAS, the use of discovered multivariate phenotypes can be advantageous in identifying pleiotropic genetic risk factors, which may have a common etiological role for the multivariate phenotypes.
全基因组关联研究(GWAS)在研究单性状疾病易感基因座方面发挥了关键作用。另一方面,GWAS在测量基因座多效性遗传效应导致的多变量表型的遗传风险因素方面存在局限性。这项工作报告了一种数据挖掘方法,用于发现以关联规则表示的多变量表型模式,并提出了针对这些新定义的多变量表型的GWAS分析方案。我们鉴定出与一种新的多变量表型相关的四个基因(CSMD1、NFE2L1、CBX1和SKAP1)的13个单核苷酸多态性(SNP),该多变量表型定义为低密度脂蛋白胆固醇水平低(LDL-C≤100mg/dl)和甘油三酯水平高(TG≥180mg/dl)。与传统的GWAS方法相比,使用发现的多变量表型在识别多效性遗传风险因素方面可能具有优势,这些因素可能对多变量表型具有共同的病因学作用。