Motsinger Alison A, Reif David M, Fanelli Theresa J, Davis Anna C, Ritchie Marylyn D
Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA 37232.
Proc IEEE Symp Comput Intell Bioinforma Comput Biol. 2007 Apr 1;2007:1-8.
One of the most important goals in genetic epidemiology is the identification of genetic factors/features that predict complex diseases. The ubiquitous nature of gene-gene interactions in the underlying etiology of common diseases creates an important analytical challenge, spurring the introduction of novel, computational approaches. One such method is a grammatical evolution neural network (GENN) approach. GENN has been shown to have high power to detect such interactions in simulation studies, but previous studies have ignored an important feature of most genetic data: linkage disequilibrium (LD). LD describes the non-random association of alleles not necessarily on the same chromosome. This results in strong correlation between variables in a dataset, which can complicate analysis. In the current study, data simulations with a range of LD patterns are used to assess the impact of such correlated variables on the performance of GENN. Our results show that not only do patterns of strong LD not decrease the power of GENN to detect genetic associations, they actually increase its power.
遗传流行病学中最重要的目标之一是识别预测复杂疾病的遗传因素/特征。常见疾病潜在病因中基因与基因相互作用的普遍存在带来了重要的分析挑战,促使人们引入新颖的计算方法。其中一种方法是语法进化神经网络(GENN)方法。在模拟研究中,GENN已被证明具有很高的检测此类相互作用的能力,但以往的研究忽略了大多数遗传数据的一个重要特征:连锁不平衡(LD)。LD描述了不一定位于同一条染色体上的等位基因的非随机关联。这导致数据集中变量之间存在强相关性,可能使分析变得复杂。在当前研究中,使用具有一系列LD模式的数据模拟来评估此类相关变量对GENN性能的影响。我们的结果表明,不仅强LD模式不会降低GENN检测遗传关联的能力,实际上还会增强其能力。