Department of Quantitative Methods, Technical University of Cartagena, Paseo Alfonso XIII, 50, 30203, Cartagena, Spain.
BMC Genet. 2010 Mar 23;11:19. doi: 10.1186/1471-2156-11-19.
The etiology of complex diseases is due to the combination of genetic and environmental factors, usually many of them, and each with a small effect. The identification of these small-effect contributing factors is still a demanding task. Clearly, there is a need for more powerful tests of genetic association, and especially for the identification of rare effects
We introduce a new genetic association test based on symbolic dynamics and symbolic entropy. Using a freely available software, we have applied this entropy test, and a conventional test, to simulated and real datasets, to illustrate the method and estimate type I error and power. We have also compared this new entropy test to the Fisher exact test for assessment of association with low-frequency SNPs. The entropy test is generally more powerful than the conventional test, and can be significantly more powerful when the genotypic test is applied to low allele-frequency markers. We have also shown that both the Fisher and Entropy methods are optimal to test for association with low-frequency SNPs (MAF around 1-5%), and both are conservative for very rare SNPs (MAF<1%)
We have developed a new, simple, consistent and powerful test to detect genetic association of biallelic/SNP markers in case-control data, by using symbolic dynamics and symbolic entropy as a measure of gene dependence. We also provide a standard asymptotic distribution of this test statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. The entropy test is generally as good or even more powerful than the conventional and Fisher tests. Furthermore, the entropy test is more computationally efficient than the Fisher's Exact test, especially for large number of markers. Therefore, this entropy-based test has the advantage of being optimal for most SNPs, regardless of their allele frequency (Minor Allele Frequency (MAF) between 1-50%). This property is quite beneficial, since many researchers tend to discard low allele-frequency SNPs from their analysis. Now they can apply the same statistical test of association to all SNPs in a single analysis., which can be especially helpful to detect rare effects.
复杂疾病的病因是遗传和环境因素的组合,通常有很多因素,每个因素的影响都很小。识别这些小效应的贡献因素仍然是一项具有挑战性的任务。显然,需要更强大的遗传关联测试,特别是用于识别罕见效应。
我们介绍了一种基于符号动力学和符号熵的新遗传关联测试。使用免费提供的软件,我们已经将这个熵测试和传统测试应用于模拟和真实数据集,以说明该方法并估计 I 型错误和功效。我们还将这种新的熵测试与 Fisher 精确检验进行了比较,以评估与低频 SNP 的关联。熵测试通常比传统测试更强大,并且当基因型测试应用于低频等位基因频率标记时,它可以显著更强大。我们还表明,Fisher 和熵方法都是检测低频 SNP(MAF 约为 1-5%)关联的最佳方法,并且对于非常罕见的 SNP(MAF<1%)都是保守的。
我们开发了一种新的、简单的、一致的和强大的测试方法,用于检测病例对照数据中双等位基因/SNP 标记的遗传关联,使用符号动力学和符号熵作为基因依赖性的度量。我们还提供了该测试统计量的标准渐近分布。由于该测试基于熵度量,因此避免了平滑的非参数估计。熵测试通常与传统和 Fisher 测试一样好,甚至更强大。此外,熵测试比 Fisher 精确检验在计算上效率更高,尤其是对于大量标记。因此,这种基于熵的测试具有优势,适用于大多数 SNP,无论其等位基因频率(1-50%之间的次要等位基因频率(MAF))如何。这一特性非常有益,因为许多研究人员倾向于从他们的分析中丢弃低频 SNP。现在,他们可以在单个分析中对所有 SNP 应用相同的关联统计测试,这对于检测罕见效应尤其有帮助。