Urbanowicz Ryan J, White Bill C, Moore Jason H
Dartmouth College, 1 Medical Center Dr., Hanover, NH 03755, USA.
Genet Evol Comput Conf. 2008 Jul 12;2008:339-346.
The study of common, complex multifactorial diseases in genetic epidemiology is complicated by nonlinearity in the genotype-to-phenotype mapping relationship that is due, in part, to epistasis or gene-gene interactions. Symobolic discriminant analysis (SDA) is a flexible modeling approach which uses genetic programming (GP) to evolve an optimal predictive model using a predefined collection of mathematical functions, constants, and attributes. This has been shown to be an effective strategy for modeling epistasis. In the present study, we introduce the genetic "mask" as a novel building block which exploits expert knowledge in the form of a pre-constructed relationship between two attributes. The goal of this study was to determine whether the availability of "mask" building blocks improves SDA performance. The results of this study support the idea that pre-processing data improves GP performance.
在遗传流行病学中,对常见的复杂多因素疾病的研究因基因型与表型映射关系的非线性而变得复杂,这种非线性部分归因于上位性或基因-基因相互作用。符号判别分析(SDA)是一种灵活的建模方法,它使用遗传编程(GP)通过预定义的数学函数、常量和属性集合来进化出一个最优预测模型。这已被证明是一种建模上位性的有效策略。在本研究中,我们引入遗传“掩码”作为一种新颖的构建模块,它以两个属性之间预先构建的关系形式利用专家知识。本研究的目的是确定“掩码”构建模块的可用性是否能提高SDA性能。本研究结果支持预处理数据可提高GP性能这一观点。