Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Lübeck, Germany.
Genet Epidemiol. 2009;33 Suppl 1:S51-7. doi: 10.1002/gepi.20473.
Recently, genome-wide association studies have substantially expanded our knowledge about genetic variants that influence the susceptibility to complex diseases. Although standard statistical tests for each single-nucleotide polymorphism (SNP) separately are able to capture main genetic effects, different approaches are necessary to identify SNPs that influence disease risk jointly or in complex interactions. Experimental and simulated genome-wide SNP data provided by the Genetic Analysis Workshop 16 afforded an opportunity to analyze the applicability and benefit of several machine learning methods. Penalized regression, ensemble methods, and network analyses resulted in several new findings while known and simulated genetic risk variants were also identified. In conclusion, machine learning approaches are promising complements to standard single-and multi-SNP analysis methods for understanding the overall genetic architecture of complex human diseases. However, because they are not optimized for genome-wide SNP data, improved implementations and new variable selection procedures are required.
最近,全基因组关联研究大大扩展了我们对影响复杂疾病易感性的遗传变异的认识。虽然单独对每个单核苷酸多态性(SNP)进行标准统计检验能够捕捉主要的遗传效应,但需要采用不同的方法来识别共同或复杂相互作用影响疾病风险的 SNPs。遗传分析研讨会 16 提供的实验和模拟全基因组 SNP 数据为分析几种机器学习方法的适用性和益处提供了机会。惩罚回归、集成方法和网络分析产生了一些新的发现,同时也确定了已知和模拟的遗传风险变异。总之,机器学习方法是理解复杂人类疾病整体遗传结构的标准单 SNP 和多 SNP 分析方法的有前途的补充。然而,由于它们不是针对全基因组 SNP 数据进行优化的,因此需要改进实现和新的变量选择过程。