Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
Department of Pharmaceutical Sciences, Departments of Psychiatry, and Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA.
BMC Med Genomics. 2020 Feb 24;13(Suppl 3):19. doi: 10.1186/s12920-020-0667-4.
The current understanding of the genetic basis of complex human diseases is that they are caused and affected by many common and rare genetic variants. A considerable number of the disease-associated variants have been identified by Genome Wide Association Studies, however, they can explain only a small proportion of heritability. One of the possible reasons for the missing heritability is that many undiscovered disease-causing variants are weakly associated with the disease. This can pose serious challenges to many statistical methods, which seems to be only capable of identifying disease-associated variants with relatively stronger coefficients.
In order to help identify weaker variants, we propose a novel statistical method, Constrained Sparse multi-locus Linear Mixed Model (CS-LMM) that aims to uncover genetic variants of weaker associations by incorporating known associations as a prior knowledge in the model. Moreover, CS-LMM accounts for polygenic effects as well as corrects for complex relatednesses. Our simulation experiments show that CS-LMM outperforms other competing existing methods in various settings when the combinations of MAFs and coefficients reflect different scenarios in complex human diseases.
We also apply our method to the GWAS data of alcoholism and Alzheimer's disease and exploratively discover several SNPs. Many of these discoveries are supported through literature survey. Furthermore, our association results strengthen the belief in genetic links between alcoholism and Alzheimer's disease.
目前,人们普遍认为,复杂人类疾病是由许多常见和罕见的遗传变异引起并受其影响的。大量与疾病相关的变异已经通过全基因组关联研究确定,但它们只能解释一小部分遗传性。遗传率缺失的一个可能原因是,许多未发现的致病变异与疾病的关联性较弱。这给许多统计方法带来了严峻的挑战,因为这些方法似乎只能识别出与疾病关联较强的变异。
为了帮助识别较弱的变异,我们提出了一种新的统计方法,即约束稀疏多基因线性混合模型(CS-LMM),该方法旨在通过将已知的关联作为模型中的先验知识,揭示较弱关联的遗传变异。此外,CS-LMM 还考虑了多基因效应,并纠正了复杂的亲缘关系。我们的模拟实验表明,在 MAF 和系数的组合反映复杂人类疾病不同情况下的各种设置中,CS-LMM 在各种现有竞争方法中表现更为出色。
我们还将我们的方法应用于酒精中毒和阿尔茨海默病的 GWAS 数据,并探索性地发现了一些 SNPs。这些发现中的许多都得到了文献调查的支持。此外,我们的关联结果进一步证实了酒精中毒和阿尔茨海默病之间存在遗传联系的信念。