从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.

机构信息

Center for Computational Biology and Bioinformatics and Department of Electrical Engineering, Columbia University, New York, NY, USA.

出版信息

BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.

DOI:10.1186/1471-2156-13-94

PMID:23110720

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3560217/

Abstract

BACKGROUND

Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data.

RESULTS

We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool.

CONCLUSIONS

Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.

摘要

背景

通常，全基因组关联研究（GWAS）的第一阶段包括对数百个人的基因分型和对最显著 SNPs 的验证。对 pooled genomic DNA 进行等位基因分型是降低研究总体成本的常见方法。单倍型结构的知识可以为单基因座分析提供额外信息。已经提出了几种从 pooled DNA 数据估计群体中单倍型频率的方法。

结果

我们引入了一种从 pooled DNA 样本中估计群体中单倍型频率的技术，重点是每个 pool 中包含少数个体（2 或 3 个个体）和大量标记的数据集。我们将我们的方法与可公开获得的最先进算法 HIPPO 和 HAPLOPOOL 进行比较，比较了不同数量的 pool 和标记大小的数据集。我们证明，对于大量标记，我们的算法在准确性和计算时间方面优于竞争方法，而对于较小的标记大小，性能相当。我们的方法在“基于树的确定性抽样池”（TDSPool）包中实现，该包可在 http://www.ee.columbia.edu/~anastas/tdspool 下载。

结论

使用基于树的确定性抽样技术，我们提出了一种从 pooled 数据估计单倍型频率的算法。我们的方法在具有大量标记的数据集上表现出优越的性能，并且可能是此类数据集中单倍型频率估计的首选方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/0e8387edee33/1471-2156-13-94-1.jpg

相似文献

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。

BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.

A haplotype inference algorithm for trios based on deterministic sampling.基于确定性采样的三体型单倍型推断算法。

BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.

HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling.单倍型池：通过DNA池和系统发育建模改进单倍型频率估计

Bioinformatics. 2007 Nov 15;23(22):3048-55. doi: 10.1093/bioinformatics/btm435. Epub 2007 Sep 25.

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA.基于合并 DNA 的联合约束稀疏表示的最大简约单倍型频率推断。

BMC Bioinformatics. 2013 Sep 8;14:270. doi: 10.1186/1471-2105-14-270.

Estimating population haplotype frequencies from pooled SNP data using incomplete database information.基于不完全的数据库信息，从合并的 SNP 数据中估计群体单体型频率。

Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.

PoooL: an efficient method for estimating haplotype frequencies from large DNA pools.PoooL：一种从大型DNA混合样本中估计单倍型频率的有效方法。

Bioinformatics. 2008 Sep 1;24(17):1942-8. doi: 10.1093/bioinformatics/btn324. Epub 2008 Jun 23.

Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing.基于合并测序数据的单体型频率精确估计和重叠池测序进行成本效益的罕见单体型携带者鉴定。

Bioinformatics. 2015 Feb 15;31(4):515-22. doi: 10.1093/bioinformatics/btu670. Epub 2014 Oct 9.

Estimating haplotype frequencies by combining data from large DNA pools with database information.通过将大型 DNA 池数据与数据库信息相结合来估计单倍型频率。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):36-44. doi: 10.1109/TCBB.2009.71.

On the use of DNA pooling to estimate haplotype frequencies.关于使用DNA池来估计单倍型频率。

Genet Epidemiol. 2003 Jan;24(1):74-82. doi: 10.1002/gepi.10195.

Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.使用PHASE算法从混合DNA样本中估计群体单倍型频率。

Genet Res (Camb). 2008 Dec;90(6):509-24. doi: 10.1017/S0016672308009877.

引用本文的文献

A joint use of pooling and imputation for genotyping SNPs.联合使用池化和插补进行 SNP 基因分型。

BMC Bioinformatics. 2022 Oct 13;23(1):421. doi: 10.1186/s12859-022-04974-7.

Regionally Smoothed Meta-Analysis Methods for GWAS Datasets.全基因组关联研究数据集的区域平滑荟萃分析方法。

Genet Epidemiol. 2016 Feb;40(2):154-60. doi: 10.1002/gepi.21949. Epub 2015 Dec 28.

A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data.用于CNV/SNP基因型数据单倍型推断的序贯蒙特卡罗框架。

EURASIP J Bioinform Syst Biol. 2014;2014(1):7. doi: 10.1186/1687-4153-2014-7. Epub 2014 Apr 24.

An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data.基于内部列表的 EM 算法，用于从合并基因型数据估计罕见变异体的单体型分布。

BMC Genet. 2013 Sep 13;14:82. doi: 10.1186/1471-2156-14-82.

本文引用的文献

Estimating haplotype frequencies by combining data from large DNA pools with database information.通过将大型 DNA 池数据与数据库信息相结合来估计单倍型频率。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):36-44. doi: 10.1109/TCBB.2009.71.

A study of the efficiency of pooling in haplotype estimation.一种用于单体型估计的合并效率研究。

Bioinformatics. 2010 Oct 15;26(20):2556-63. doi: 10.1093/bioinformatics/btq492. Epub 2010 Aug 27.

Estimating population haplotype frequencies from pooled SNP data using incomplete database information.基于不完全的数据库信息，从合并的 SNP 数据中估计群体单体型频率。

Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.

Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.使用PHASE算法从混合DNA样本中估计群体单倍型频率。

Genet Res (Camb). 2008 Dec;90(6):509-24. doi: 10.1017/S0016672308009877.

Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium.在有和没有哈迪-温伯格平衡的情况下，从混合DNA中对单倍型频率进行计算上可行的估计。

Bioinformatics. 2009 Feb 1;25(3):379-86. doi: 10.1093/bioinformatics/btn623. Epub 2008 Dec 2.

PoooL: an efficient method for estimating haplotype frequencies from large DNA pools.PoooL：一种从大型DNA混合样本中估计单倍型频率的有效方法。

Bioinformatics. 2008 Sep 1;24(17):1942-8. doi: 10.1093/bioinformatics/btn324. Epub 2008 Jun 23.

HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling.单倍型池：通过DNA池和系统发育建模改进单倍型频率估计

Bioinformatics. 2007 Nov 15;23(22):3048-55. doi: 10.1093/bioinformatics/btm435. Epub 2007 Sep 25.

Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies.通过基于混合样本的全基因组单核苷酸多态性关联研究来鉴定复杂疾病的遗传基础。

Am J Hum Genet. 2007 Jan;80(1):126-39. doi: 10.1086/510686. Epub 2006 Dec 6.

Two-stage designs in case-control association analysis.病例对照关联分析中的两阶段设计。

Genetics. 2006 Jul;173(3):1747-60. doi: 10.1534/genetics.105.042648. Epub 2006 Apr 19.

Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.在单倍型推断和缺失数据插补中考虑连锁不平衡的衰减。

Am J Hum Genet. 2005 Mar;76(3):449-62. doi: 10.1086/428594. Epub 2005 Jan 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献