McKinney Brett A, Reif David M, Ritchie Marylyn D, Moore Jason H
Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University Medical School, Nashville, Tennessee, USA.
Appl Bioinformatics. 2006;5(2):77-88. doi: 10.2165/00822942-200605020-00002.
Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are 'the norm' and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.
基因与环境因素之间的复杂相互作用在常见人类疾病病因学中发挥着作用。越来越多的证据表明,复杂相互作用是“常态”,而且这些相互作用可能并非对经典孟德尔遗传学的微小扰动,而是主要效应。传统统计方法不太适合检测此类相互作用,尤其是当数据具有高维度(许多属性或自变量)时,或者当两个以上多态性之间发生相互作用时。在本综述中,我们讨论用于识别和表征常见、复杂、多因素人类疾病中易感基因的机器学习模型和算法。我们重点关注以下用于检测基因-基因相互作用的机器学习方法:神经网络、细胞自动机、随机森林和多因素降维。最后,我们就如何将这些方法及其他方法整合到一个用于人类遗传学数据挖掘和知识发现的全面且灵活的框架中提出了一些想法。