一种基于熵的方法，用于检测复杂疾病背后的基因上位性。

An entropy-based approach for testing genetic epistasis underlying complex diseases.

作者信息

Kang Guolian, Yue Weihua, Zhang Jifeng, Cui Yuehua, Zuo Yijun, Zhang Dai

机构信息

Department of Statistics and Probability, East Lansing, Michigan State University, MI 48824, USA.

出版信息

J Theor Biol. 2008 Jan 21;250(2):362-74. doi: 10.1016/j.jtbi.2007.10.001. Epub 2007 Oct 6.

DOI:10.1016/j.jtbi.2007.10.001

PMID:17996908

Abstract

The genetic basis of complex diseases is expected to be highly heterogeneous, with complex interactions among multiple disease loci and environment factors. Due to the multi-dimensional property of interactions among large number of genetic loci, efficient statistical approach has not been well developed to handle the high-order epistatic complexity. In this article, we introduce a new approach for testing genetic epistasis in multiple loci using an entropy-based statistic for a case-only design. The entropy-based statistic asymptotically follows a chi(2) distribution. Computer simulations show that the entropy-based approach has better control of type I error and higher power compared to the standard chi(2) test. Motivated by a schizophrenia data set, we propose a method for measuring and testing the relative entropy of a clinical phenotype, through which one can test the contribution or interaction of multiple disease loci to a clinical phenotype. A sequential forward selection procedure is proposed to construct a genetic interaction network which is illustrated through a tree-based diagram. The network information clearly shows the relative importance of a set of genetic loci on a clinical phenotype. To show the utility of the new entropy-based approach, it is applied to analyze two real data sets, a schizophrenia data set and a published malaria data set. Our approach provides a fast and testable framework for genetic epistasis study in a case-only design.

摘要

复杂疾病的遗传基础预计具有高度的异质性，多个疾病基因座与环境因素之间存在复杂的相互作用。由于大量基因座之间相互作用的多维度特性，尚未很好地开发出有效的统计方法来处理高阶上位性复杂性。在本文中，我们介绍了一种新方法，用于在仅病例设计中使用基于熵的统计量来检验多个基因座中的基因上位性。基于熵的统计量渐近服从卡方分布。计算机模拟表明，与标准卡方检验相比，基于熵的方法对I型错误有更好的控制且功效更高。受一个精神分裂症数据集的启发，我们提出了一种测量和检验临床表型相对熵的方法，通过该方法可以检验多个疾病基因座对临床表型的贡献或相互作用。提出了一种顺序向前选择程序来构建遗传相互作用网络，并通过基于树的图进行说明。网络信息清楚地显示了一组基因座对临床表型的相对重要性。为了展示基于熵的新方法的实用性，将其应用于分析两个真实数据集，一个精神分裂症数据集和一个已发表的疟疾数据集。我们的方法为仅病例设计中的基因上位性研究提供了一个快速且可检验的框架。