从头推断测序研究中的分层和局部混合。

De novo inference of stratification and local admixture in sequencing studies.

机构信息

Department of Statistics, The Pennsylvania State University 326 Thomas Building, University Park, PA 16802, USA.

出版信息

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10.

DOI:10.1186/1471-2105-14-S5-S17

PMID:23734678

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3622634/

Abstract

Analysis of population structures and genome local ancestry has become increasingly important in population and disease genetics. With the advance of next generation sequencing technologies, complete genetic variants in individuals' genomes are quickly generated, providing unprecedented opportunities for learning population evolution histories and identifying local genetic signatures at the SNP resolution. The successes of those studies critically rely on accurate and powerful computational tools that can fully utilize the sequencing information. Although many algorithms have been developed for population structure inference and admixture mapping, many of them only work for independent SNPs in genotype or haplotype format, and require a large panel of reference individuals. In this paper, we propose a novel probabilistic method for detecting population structure and local admixture. The method takes input of sequencing data, genotype data and haplotype data. The method characterizes the dependence of genetic variants via haplotype segmentation, such that all variants detected in a sequencing study can be fully utilized for inference. The method further utilizes a infinite-state Bayesian Markov model to perform de novo stratification and admixture inference. Using simulated datasets from HapMapII and 1000Genomes, we show that our method performs superior than several existing algorithms, particularly when limited or no reference individuals are available. Our method is applicable to not only human studies but also studies of other species of interests, for which little reference information is available.Software Availability: http://stat.psu.edu/~yuzhang/software/dbm.tar.

摘要

人口结构和基因组局部亲缘关系的分析在人口和疾病遗传学中变得越来越重要。随着下一代测序技术的进步，个体基因组中的完整遗传变异迅速产生，为了解人口进化历史和识别 SNP 分辨率下的局部遗传特征提供了前所未有的机会。这些研究的成功关键依赖于能够充分利用测序信息的准确而强大的计算工具。虽然已经开发了许多用于群体结构推断和混合映射的算法，但其中许多算法仅适用于基因型或单倍型格式中的独立 SNP，并且需要大量的参考个体。在本文中，我们提出了一种用于检测群体结构和局部混合的新颖概率方法。该方法输入测序数据、基因型数据和单倍型数据。该方法通过单倍型分割来描述遗传变异的依赖性，从而可以充分利用测序研究中检测到的所有变异进行推断。该方法进一步利用无限状态贝叶斯马尔可夫模型进行从头分层和混合推断。使用 HapMapII 和 1000Genomes 中的模拟数据集，我们表明我们的方法优于几种现有的算法，特别是在可用参考个体有限或没有参考个体的情况下。我们的方法不仅适用于人类研究，也适用于其他感兴趣物种的研究，对于这些研究，参考信息很少。

软件可用性

http://stat.psu.edu/~yuzhang/software/dbm.tar。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae35/3622634/f106f266a605/1471-2105-14-S5-S17-1.jpg

相似文献

De novo inference of stratification and local admixture in sequencing studies.

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10.

A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.

Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Stacks 2: Analytical methods for paired-end sequencing improve RADseq-based population genomics.

Mol Ecol. 2019 Nov;28(21):4737-4754. doi: 10.1111/mec.15253. Epub 2019 Oct 17.

HapTree: a novel Bayesian framework for single individual polyplotyping using NGS data.

PLoS Comput Biol. 2014 Mar 27;10(3):e1003502. doi: 10.1371/journal.pcbi.1003502. eCollection 2014 Mar.

Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.

BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.

A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations.

PLoS One. 2016 Mar 11;11(3):e0151047. doi: 10.1371/journal.pone.0151047. eCollection 2016.

fastSTRUCTURE: variational inference of population structure in large SNP data sets.

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

Estimating individual admixture proportions from next generation sequencing data.

Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

引用本文的文献

Genome-Wide Analysis of SNPs Is Consistent with No Domestic Dog Ancestry in the Endangered Mexican Wolf (Canis lupus baileyi).

J Hered. 2018 May 11;109(4):372-383. doi: 10.1093/jhered/esy009.

ALDsuite: Dense marker MALD using principal components of ancestral linkage disequilibrium.

BMC Genet. 2015 Mar 7;16:23. doi: 10.1186/s12863-015-0179-y.

本文引用的文献

A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing.

Bioinformatics. 2013 Apr 1;29(7):878-85. doi: 10.1093/bioinformatics/btt065. Epub 2013 Feb 13.

Fast and accurate inference of local ancestry in Latino populations.

Bioinformatics. 2012 May 15;28(10):1359-67. doi: 10.1093/bioinformatics/bts144. Epub 2012 Apr 11.

Inference of population structure using dense haplotype data.

PLoS Genet. 2012 Jan;8(1):e1002453. doi: 10.1371/journal.pgen.1002453. Epub 2012 Jan 26.

New approaches to disease mapping in admixed populations.

Nat Rev Genet. 2011 Jun 28;12(8):523-8. doi: 10.1038/nrg3002.

Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium.

PLoS Genet. 2011 Apr;7(4):e1001371. doi: 10.1371/journal.pgen.1001371. Epub 2011 Apr 21.

A map of human genome variation from population-scale sequencing.

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Fast model-based estimation of ancestry in unrelated individuals.

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

Sensitive detection of chromosomal segments of distinct ancestry in admixed populations.

PLoS Genet. 2009 Jun;5(6):e1000519. doi: 10.1371/journal.pgen.1000519. Epub 2009 Jun 19.

Inference of locus-specific ancestry in closely related populations.

Bioinformatics. 2009 Jun 15;25(12):i213-21. doi: 10.1093/bioinformatics/btp197.

On the inference of ancestries in admixed populations.

Genome Res. 2008 Apr;18(4):668-75. doi: 10.1101/gr.072751.107. Epub 2008 Mar 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从头推断测序研究中的分层和局部混合。

De novo inference of stratification and local admixture in sequencing studies.

机构信息

出版信息

软件可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献