Dai James Y, Leblanc Michael, Smith Nicholas L, Psaty Bruce, Kooperberg Charles
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, M2-C200, Seattle, WA 98109, USA.
Biostatistics. 2009 Oct;10(4):680-93. doi: 10.1093/biostatistics/kxp023. Epub 2009 Jul 15.
Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.
关联研究已被广泛用于识别复杂疾病的遗传易感性变异。虽然一次扫描染色体区域的单个单核苷酸多态性(SNP)可能无法充分探索连锁不平衡,但单倍型分析往往需要相当多的参数,因此可能会降低效能。已经提出了聚类算法,如分支方法,以降低维度,但它们有重要的局限性。我们提出了一种单核苷酸多态性-单倍型自适应回归(SHARE)算法,该算法通过逐步增加或减少一个SNP来生长和收缩单倍型,并通过交叉验证比较不同模型的预测误差,从而在目标候选区域中寻找用于遗传关联的最具信息性的SNP集合。根据疾病突变和标记的进化历史,该集合可能包含单个SNP或几个SNP,为单倍型分析奠定基础。通过将单倍型重建视为学习过程的一部分,有效地解决了单倍型相位模糊问题。模拟和数据应用表明,我们的方法比现有方法具有更高的效能,并且结果在寻找疾病致病基因座方面具有参考价值。