Brief Bioinform. 2018 Jan 1;19(1):136-147. doi: 10.1093/bib/bbw086.
Genome-wide association studies are moving to genome-wide interaction studies, as the genetic background of many diseases appears to be more complex than previously supposed. Thus, many statistical approaches have been proposed to detect gene-gene (GxG) interactions, among them numerous information theory-based methods, inspired by the concept of entropy. These are suggested as particularly powerful and, because of their nonlinearity, as better able to capture nonlinear relationships between genetic variants and/or variables. However, the introduced entropy-based estimators differ to a surprising extent in their construction and even with respect to the basic definition of interactions. Also, not every entropy-based measure for interaction is accompanied by a proper statistical test. To shed light on this, a systematic review of the literature is presented answering the following questions: (1) How are GxG interactions defined within the framework of information theory? (2) Which entropy-based test statistics are available? (3) Which underlying distribution do the test statistics follow? (4) What are the given strengths and limitations of these test statistics?
全基因组关联研究正在向全基因组相互作用研究转变,因为许多疾病的遗传背景似乎比以前想象的更为复杂。因此,已经提出了许多统计方法来检测基因-基因(GxG)相互作用,其中许多基于信息论的方法受到熵概念的启发。这些方法被认为特别强大,并且由于它们的非线性,能够更好地捕捉遗传变异体和/或变量之间的非线性关系。然而,所引入的基于熵的估计器在其构建中甚至在相互作用的基本定义方面差异很大。此外,并非每个用于相互作用的基于熵的度量都伴随着适当的统计检验。为了阐明这一点,本文对文献进行了系统回顾,回答了以下问题:(1)在信息论框架内如何定义 GxG 相互作用?(2)有哪些可用的基于熵的检验统计量?(3)检验统计量遵循什么潜在分布?(4)这些检验统计量的给定优势和局限性是什么?