通过单倍型聚类对疾病基因进行精细定位。

Fine mapping of disease genes via haplotype clustering.

作者信息

Waldron E R B, Whittaker J C, Balding D J

机构信息

Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom.

出版信息

Genet Epidemiol. 2006 Feb;30(2):170-9. doi: 10.1002/gepi.20134.

DOI:10.1002/gepi.20134

PMID:16385468

Abstract

We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.

摘要

我们提出了一种用于分析基于单核苷酸多态性（SNP）的群体关联研究的算法，该算法是对Molitor等人[2003年：《美国人类遗传学杂志》73卷：1368 - 1384页]所介绍算法的改进。它利用单倍型聚类来克服许多当前基于单倍型方法的主要局限性。我们定义了一种单倍型间得分，该得分简单，却似乎能捕捉到关于（未观察到的）假定因果位点附近单倍型进化相关性的许多信息。然后，可以通过假定的祖先单倍型和截止距离来定义单倍型聚类。个体位于聚类内的两个单倍型的数量可预测该个体在因果位点的基因型。然后可以研究这种预测的基因型与感兴趣的表型之间的关联。我们在马尔可夫链蒙特卡罗算法中实现我们的方法，实际上该算法会在位置和祖先单倍型上进行搜索，以识别大的、病例丰富的聚类。在使用真实数据的测试分析中，该算法成功地对因果突变进行了精细定位，并且在预测因果位点的基因型时达到了近98%的准确率。一项模拟研究表明，新算法明显优于其他方法，它还使我们能够识别出多点方法相对于单SNP分析可显著改进的情况。我们的算法运行速度快，并且有扩展到广泛的疾病模型和基因组规模的空间。