Discipline of Mathematical Sciences, Queensland University of Technology, Gardens Point, Brisbane, Queensland 4001, Australia.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1580-91. doi: 10.1109/TCBB.2011.46.
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
由于计算能力的提高、技术的增强和基因分型价格的降低,越来越多的数据被用于理解遗传与疾病和障碍的关联。然而,随着大量数据集的出现,统计分析和建模的新方法带来了固有挑战。由于复杂表型可能是多个基因座共同作用的结果,因此已经开发了各种统计方法来识别遗传上位性效应。在这些方法中,逻辑回归(LR)是一种引人注目的方法,它包含树状结构。各种方法都在原始 LR 的基础上进行了改进,以改善模型的不同方面。在这项研究中,我们回顾了四种 LR 变体,即逻辑特征选择、蒙特卡罗逻辑回归、用于关联研究的遗传编程和修改后的逻辑回归-基因表达编程,并使用模拟和真实基因型数据研究了每种方法的性能。我们将这些方法与另一种树状方法,即随机森林,以及具有随机搜索变量选择的贝叶斯逻辑回归进行了对比。