Cherkas Yauheniya, Raghavan Nandini, Francke Stephan, Defalco Frank, Wilcox Marsha A
Epidemiology, Johnson & Johnson, 1125 Trenton-Harbourton Road, Titusville, NJ 08560, USA.
BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S94. doi: 10.1186/1753-6561-5-S9-S94.
In addition to methods that can identify common variants associated with susceptibility to common diseases, there has been increasing interest in approaches that can identify rare genetic variants. We use the simulated data provided to the participants of Genetic Analysis Workshop 17 (GAW17) to identify both rare and common single-nucleotide polymorphisms and pathways associated with disease status. We apply a rare variant collapsing approach and the usual association tests for common variants to identify candidates for further analysis using pathway-based and tree-based ensemble approaches. We use the mean log p-value approach to identify a top set of pathways and compare it to those used in simulation of GAW17 dataset. We conclude that the mean log p-value approach is able to identify those pathways in the top list and also related pathways. We also use the stochastic gradient boosting approach for the selected subset of single-nucleotide polymorphisms. When compared the result of this tree-based method with the list of single-nucleotide polymorphisms used in dataset simulation, in addition to correct SNPs we observe number of false positives.
除了能够识别与常见疾病易感性相关的常见变异的方法外,人们对能够识别罕见遗传变异的方法的兴趣也在不断增加。我们使用提供给遗传分析研讨会17(GAW17)参与者的模拟数据,来识别与疾病状态相关的罕见和常见单核苷酸多态性及通路。我们应用一种罕见变异合并方法和针对常见变异的常规关联测试,以识别使用基于通路和基于树的集成方法进行进一步分析的候选对象。我们使用平均对数p值方法来识别一组顶级通路,并将其与GAW17数据集模拟中使用的通路进行比较。我们得出结论,平均对数p值方法能够识别顶级列表中的那些通路以及相关通路。我们还对选定的单核苷酸多态性子集使用随机梯度提升方法。当将这种基于树的方法的结果与数据模拟中使用的单核苷酸多态性列表进行比较时,除了正确的单核苷酸多态性外,我们还观察到了一些假阳性。