Kirkpatrick Bonnie, Armendariz Carlos Santos, Karp Richard M, Halperin Eran
Department of Electrical Engineering and Computer Sciences, UC Berkeley, CA, , USA.
Bioinformatics. 2007 Nov 15;23(22):3048-55. doi: 10.1093/bioinformatics/btm435. Epub 2007 Sep 25.
The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs.
HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).
寻找与癌症、帕金森氏症或阿尔茨海默氏症等复杂疾病相关的基因变异,可能会带来更好的治疗方法。由于单倍型可以作为隐藏变异的替代物,一种寻找相关变异的方法是寻找单倍型与疾病之间的病例对照关联。找到这些关联需要对人群中的单倍型频率进行高质量的估计。为此,我们提出了HaploPool,一种从连续单核苷酸多态性(SNP)块估计单倍型频率的方法。
HaploPool利用DNA池的效率,从不相交的集合池中估计人群单倍型频率,每个集合包含两个或三个不相关的个体。我们研究了池化效率与单倍型频率估计准确性之间的权衡。对于固定的基因分型预算,与一种先进的非池化定相方法PHASE相比,HaploPool在两个个体的池上表现良好。具有独立意义的是,可以使用HaploPool对非池化基因型数据进行定相,其准确性接近PHASE。我们将我们的算法与三个从池化数据估计单倍型频率的程序进行了比较。HaploPool比以前的方法效率高一个数量级(至少快六倍),并且准确性更高。与以前的方法不同,HaploPool在存在缺失数据、基因分型错误和长单倍型块(5至25个SNP)的情况下表现良好。