基于系统发育的单倍型聚类：关联测试的策略有多好？

Clustering of haplotypes based on phylogeny: how good a strategy for association testing?

作者信息

Bardel Claire, Darlu Pierre, Génin Emmanuelle

机构信息

INSERM U535, Hôpital Paul Brousse, Villejuif, France.

出版信息

Eur J Hum Genet. 2006 Feb;14(2):202-6. doi: 10.1038/sj.ejhg.5201501.

DOI:10.1038/sj.ejhg.5201501

PMID:16306882

Abstract

Haplotypes are now widely used in association studies between markers and disease susceptibility locus. However, when a large number of markers are considered, the number of possible haplotypes increases leading to two problems: an increased number of degrees of freedom that may result in a lack of power and the existence of rare haplotypes that may be difficult to take into account in the statistical analysis. In a recent paper, Durrant et al proposed a method, CLADHC, to group haplotypes based on distance matrices and showed that this could considerably increase the power of the association test as compared to either single-locus analysis or haplotype analysis without prior grouping. Although the authors considered different one-disease-locus susceptibility models in their simulations, they did not study the impact of the linkage disequilibrium (LD) pattern and of the susceptibility allele frequency on their conclusions. Here, we show, using haplotype data from five regions of the genome of different lengths and with different LD patterns, that, when a single disease susceptibility locus is simulated, the prior grouping of haplotypes based on the algorithm of Durrant et al does not increase the power of association testing except in very particular situations of LD patterns and allele frequencies.

摘要

单倍型目前在标记与疾病易感位点之间的关联研究中被广泛使用。然而，当考虑大量标记时，可能的单倍型数量会增加，从而导致两个问题：自由度增加可能导致检验效能不足，以及存在罕见单倍型，这在统计分析中可能难以考虑在内。在最近的一篇论文中，杜兰特等人提出了一种基于距离矩阵对单倍型进行分组的方法CLADHC，并表明与单基因座分析或无先验分组的单倍型分析相比，这可以显著提高关联检验的效能。尽管作者在模拟中考虑了不同的单疾病基因座易感模型，但他们没有研究连锁不平衡（LD）模式和易感等位基因频率对其结论的影响。在这里，我们使用来自基因组五个不同长度区域且具有不同LD模式的单倍型数据表明，当模拟单个疾病易感基因座时，基于杜兰特等人的算法对单倍型进行先验分组并不会提高关联检验的效能，除非在非常特殊的LD模式和等位基因频率情况下。