• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.

机构信息

Center for Computational Biology and Bioinformatics and Department of Electrical Engineering, Columbia University, New York, NY, USA.

出版信息

BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.

DOI:10.1186/1471-2156-13-94
PMID:23110720
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3560217/
Abstract

BACKGROUND

Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data.

RESULTS

We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool.

CONCLUSIONS

Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.

摘要

背景

通常,全基因组关联研究(GWAS)的第一阶段包括对数百个人的基因分型和对最显著 SNPs 的验证。对 pooled genomic DNA 进行等位基因分型是降低研究总体成本的常见方法。单倍型结构的知识可以为单基因座分析提供额外信息。已经提出了几种从 pooled DNA 数据估计群体中单倍型频率的方法。

结果

我们引入了一种从 pooled DNA 样本中估计群体中单倍型频率的技术,重点是每个 pool 中包含少数个体(2 或 3 个个体)和大量标记的数据集。我们将我们的方法与可公开获得的最先进算法 HIPPO 和 HAPLOPOOL 进行比较,比较了不同数量的 pool 和标记大小的数据集。我们证明,对于大量标记,我们的算法在准确性和计算时间方面优于竞争方法,而对于较小的标记大小,性能相当。我们的方法在“基于树的确定性抽样池”(TDSPool)包中实现,该包可在 http://www.ee.columbia.edu/~anastas/tdspool 下载。

结论

使用基于树的确定性抽样技术,我们提出了一种从 pooled 数据估计单倍型频率的算法。我们的方法在具有大量标记的数据集上表现出优越的性能,并且可能是此类数据集中单倍型频率估计的首选方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/70deab93f6ac/1471-2156-13-94-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/0e8387edee33/1471-2156-13-94-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/480c4d69b911/1471-2156-13-94-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/fbda678e8896/1471-2156-13-94-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/70deab93f6ac/1471-2156-13-94-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/0e8387edee33/1471-2156-13-94-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/480c4d69b911/1471-2156-13-94-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/fbda678e8896/1471-2156-13-94-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b02a/3560217/70deab93f6ac/1471-2156-13-94-4.jpg

相似文献

1
Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data.从混合 DNA 数据中快速准确估计大型单倍型向量的单倍型频率。
BMC Genet. 2012 Oct 30;13:94. doi: 10.1186/1471-2156-13-94.
2
A haplotype inference algorithm for trios based on deterministic sampling.基于确定性采样的三体型单倍型推断算法。
BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.
3
HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling.单倍型池:通过DNA池和系统发育建模改进单倍型频率估计
Bioinformatics. 2007 Nov 15;23(22):3048-55. doi: 10.1093/bioinformatics/btm435. Epub 2007 Sep 25.
4
Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA.基于合并 DNA 的联合约束稀疏表示的最大简约单倍型频率推断。
BMC Bioinformatics. 2013 Sep 8;14:270. doi: 10.1186/1471-2105-14-270.
5
Estimating population haplotype frequencies from pooled SNP data using incomplete database information.基于不完全的数据库信息,从合并的 SNP 数据中估计群体单体型频率。
Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.
6
PoooL: an efficient method for estimating haplotype frequencies from large DNA pools.PoooL:一种从大型DNA混合样本中估计单倍型频率的有效方法。
Bioinformatics. 2008 Sep 1;24(17):1942-8. doi: 10.1093/bioinformatics/btn324. Epub 2008 Jun 23.
7
Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing.基于合并测序数据的单体型频率精确估计和重叠池测序进行成本效益的罕见单体型携带者鉴定。
Bioinformatics. 2015 Feb 15;31(4):515-22. doi: 10.1093/bioinformatics/btu670. Epub 2014 Oct 9.
8
Estimating haplotype frequencies by combining data from large DNA pools with database information.通过将大型 DNA 池数据与数据库信息相结合来估计单倍型频率。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):36-44. doi: 10.1109/TCBB.2009.71.
9
On the use of DNA pooling to estimate haplotype frequencies.关于使用DNA池来估计单倍型频率。
Genet Epidemiol. 2003 Jan;24(1):74-82. doi: 10.1002/gepi.10195.
10
Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.使用PHASE算法从混合DNA样本中估计群体单倍型频率。
Genet Res (Camb). 2008 Dec;90(6):509-24. doi: 10.1017/S0016672308009877.

引用本文的文献

1
A joint use of pooling and imputation for genotyping SNPs.联合使用池化和插补进行 SNP 基因分型。
BMC Bioinformatics. 2022 Oct 13;23(1):421. doi: 10.1186/s12859-022-04974-7.
2
Regionally Smoothed Meta-Analysis Methods for GWAS Datasets.全基因组关联研究数据集的区域平滑荟萃分析方法。
Genet Epidemiol. 2016 Feb;40(2):154-60. doi: 10.1002/gepi.21949. Epub 2015 Dec 28.
3
A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data.用于CNV/SNP基因型数据单倍型推断的序贯蒙特卡罗框架。

本文引用的文献

1
Estimating haplotype frequencies by combining data from large DNA pools with database information.通过将大型 DNA 池数据与数据库信息相结合来估计单倍型频率。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):36-44. doi: 10.1109/TCBB.2009.71.
2
A study of the efficiency of pooling in haplotype estimation.一种用于单体型估计的合并效率研究。
Bioinformatics. 2010 Oct 15;26(20):2556-63. doi: 10.1093/bioinformatics/btq492. Epub 2010 Aug 27.
3
Estimating population haplotype frequencies from pooled SNP data using incomplete database information.
EURASIP J Bioinform Syst Biol. 2014;2014(1):7. doi: 10.1186/1687-4153-2014-7. Epub 2014 Apr 24.
4
An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data.基于内部列表的 EM 算法,用于从合并基因型数据估计罕见变异体的单体型分布。
BMC Genet. 2013 Sep 13;14:82. doi: 10.1186/1471-2156-14-82.
基于不完全的数据库信息,从合并的 SNP 数据中估计群体单体型频率。
Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.
4
Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm.使用PHASE算法从混合DNA样本中估计群体单倍型频率。
Genet Res (Camb). 2008 Dec;90(6):509-24. doi: 10.1017/S0016672308009877.
5
Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium.在有和没有哈迪-温伯格平衡的情况下,从混合DNA中对单倍型频率进行计算上可行的估计。
Bioinformatics. 2009 Feb 1;25(3):379-86. doi: 10.1093/bioinformatics/btn623. Epub 2008 Dec 2.
6
PoooL: an efficient method for estimating haplotype frequencies from large DNA pools.PoooL:一种从大型DNA混合样本中估计单倍型频率的有效方法。
Bioinformatics. 2008 Sep 1;24(17):1942-8. doi: 10.1093/bioinformatics/btn324. Epub 2008 Jun 23.
7
HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling.单倍型池:通过DNA池和系统发育建模改进单倍型频率估计
Bioinformatics. 2007 Nov 15;23(22):3048-55. doi: 10.1093/bioinformatics/btm435. Epub 2007 Sep 25.
8
Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies.通过基于混合样本的全基因组单核苷酸多态性关联研究来鉴定复杂疾病的遗传基础。
Am J Hum Genet. 2007 Jan;80(1):126-39. doi: 10.1086/510686. Epub 2006 Dec 6.
9
Two-stage designs in case-control association analysis.病例对照关联分析中的两阶段设计。
Genetics. 2006 Jul;173(3):1747-60. doi: 10.1534/genetics.105.042648. Epub 2006 Apr 19.
10
Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.在单倍型推断和缺失数据插补中考虑连锁不平衡的衰减。
Am J Hum Genet. 2005 Mar;76(3):449-62. doi: 10.1086/428594. Epub 2005 Jan 31.