Jung Jeesun, Dantzer Jessica, Liu Yunlong
Department of Medical and Molecular Genetics, Indiana University School of Medicine, IB 130, 975 West Walnut Street, Indianapolis, IN 46202, USA.
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S103. doi: 10.1186/1753-6561-5-S9-S103.
Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.
测序技术的进步推动了对复杂疾病相关罕见变异的识别。然而,能够处理大量生成数据并解释疾病与这些变异之间复杂关系的统计方法却滞后了。我们应用零膨胀泊松回归模型,以考虑到遗传分析研讨会17数据中24,487个外显子变异极低频率所导致的零值过多情况。我们基于主成分分析将数据集中的697名受试者分为欧洲人、亚洲人和非洲人,并计算出每个个体每个基因的罕见变异总数。然后,我们基于与未受影响人群相比,罕见变异在受疾病影响人群中富集的假设,对这些汇总变异进行分析。我们还使用数量性状Q1、Q2和Q4对该假设进行了检验。对697名个体的合并样本以及每个种族群体进行的分析得出了不同结果。对于合并人群分析,我们发现模拟模型中未包含的UGT1A1与疾病易感性相关,而模拟模型中的因果位点FLT1与Q1相关。在模拟模型的因果位点中,FLT1和KDR与Q1相关,VNN1与Q2相关。没有显著基因与Q4相关。这些结果表明了我们新的统计模型检测影响疾病风险的多个罕见变异的可行性和能力。