Lin Ying-Chao, Hsieh Ai-Ru, Hsiao Ching-Lin, Wu Shang-Jung, Wang Hui-Min, Lian Ie-Bin, Fann Cathy S J
Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan.
J Biomed Sci. 2014 Aug 30;21(1):88. doi: 10.1186/s12929-014-0088-9.
Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects.
We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease.
Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.
全基因组关联研究已成功鉴定出人类疾病的常见遗传变异。然而,与帕金森病等疾病相关的许多遗传变异仍不为人知,这表明还有更多的风险位点有待发现。罕见变异在疾病关联研究中对于解释缺失的遗传力变得至关重要。检测此类关联的方法需要对候选基因有先验知识,并将区域内的变异进行合并。在存在许多中性变异或具有相反效应的因果变异的情况下,这些方法可能会出现效能损失。
我们提出了一种能够扫描遗传变异以识别最有可能包含具有罕见和/或常见因果变异的疾病基因区域的方法。我们的方法基于我们的评分系统为每个个体变异赋予一个分数。它使用汇总分数来识别与疾病相关的区域。我们基于千人基因组测序数据通过模拟评估性能,并与三种常用方法进行比较。我们使用帕金森病病例对照数据集作为模型来展示我们方法的应用。在基于千人基因组测序数据的模拟中,我们的方法比CMC和WSS具有更好的效能,并且与SKAT - O具有相似的效能,同时I型错误得到了良好控制。在实际数据分析中,我们证实了α - 突触核蛋白基因(SNCA)与帕金森病的关联(p = 0.005)。我们进一步鉴定出与透明质酸合酶2(HAS2,p = 0.028)和含kringle跨膜蛋白1(KREMEN1,p = 0.006)的关联。KREMEN1与Wnt信号通路相关,该信号通路已被证明在帕金森病的神经退行性变中起重要作用。
我们的方法具有时间效率,对中性变异的纳入和因果变异的方向效应不太敏感。它可以将基因组区域或染色体缩小到与疾病相关的区域。以帕金森病为模型,我们的方法不仅证实了一个已知基因的关联,还鉴定出了先前其他研究发现的两个基因。尽管有许多现有方法,但我们得出结论,我们的方法是探索包含罕见和常见变异的基因组数据的一种有效替代方法。