Center for Computational Biology and Bioinformatics and Department of Electrical Engineering, Columbia University, New York, NY, USA.
BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.
Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data.
We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool.
Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.
通常,全基因组关联研究(GWAS)的第一阶段包括对数百个人的基因分型和对最显著 SNPs 的验证。对 pooled genomic DNA 进行等位基因分型是降低研究总体成本的常见方法。单倍型结构的知识可以为单基因座分析提供额外信息。已经提出了几种从 pooled DNA 数据估计群体中单倍型频率的方法。
我们引入了一种从 pooled DNA 样本中估计群体中单倍型频率的技术,重点是每个 pool 中包含少数个体(2 或 3 个个体)和大量标记的数据集。我们将我们的方法与可公开获得的最先进算法 HIPPO 和 HAPLOPOOL 进行比较,比较了不同数量的 pool 和标记大小的数据集。我们证明,对于大量标记,我们的算法在准确性和计算时间方面优于竞争方法,而对于较小的标记大小,性能相当。我们的方法在“基于树的确定性抽样池”(TDSPool)包中实现,该包可在 http://www.ee.columbia.edu/~anastas/tdspool 下载。
使用基于树的确定性抽样技术,我们提出了一种从 pooled 数据估计单倍型频率的算法。我们的方法在具有大量标记的数据集上表现出优越的性能,并且可能是此类数据集中单倍型频率估计的首选方法。