通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

作者信息

Ahn Soyeon, Vikalo Haris

机构信息

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, 78712, Texas, USA.

出版信息

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

DOI:10.1186/s12859-015-0651-8

PMID:26178880

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4503296/

Abstract

BACKGROUND

Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly.

RESULTS

We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed.

CONCLUSIONS

The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes. A C code implementation of ParticleHap will be available for download from https://sites.google.com/site/asynoeun/particlehap.

摘要

背景

基因变异使个体易患遗传性疾病，在复杂疾病的发展中起重要作用，并影响药物代谢。单倍型给出了个体基因组中DNA变异的完整信息，单倍型是位于染色体上的单核苷酸多态性（SNP）的有序列表。经济实惠的高通量DNA测序技术使得常规获取组装单倍型所需的数据成为可能。然而，最先进的高通量测序平台产生的数据存在错误，这在SNP和基因型判定过程中引入了不确定性，并最终对单倍型分型的准确性产生不利影响。在推断单倍型相位信息时，绝大多数现有的单倍型组装技术都假定基因型信息是正确的。这促使了能够联合进行基因型判定和单倍型组装的方法的开发。

结果

我们提出了一种单倍型组装算法ParticleHap，该算法依赖于对测序数据的概率描述来联合推断基因型并组装最可能的单倍型。我们的方法采用了确定性序贯蒙特卡罗算法，通过详尽探索部分单倍型的所有可能扩展，将单核苷酸多态性与单倍型相关联。该算法依赖于基因型似然性而非经常错误判定的基因型，从而确保了单倍型更准确的组装。对千人基因组计划实验数据以及模拟研究的结果表明，所提出的方法能够以高效的计算方式为单倍型组装问题提供高度准确的解决方案，同时在准确性和速度方面通常优于现有方法。

结论

所开发的概率框架和序贯蒙特卡罗算法能够以高效的计算方式实现联合单倍型组装和基因分型。我们的结果表明，通过重新检查错误判定的基因型，能够实现快速且高度准确的单倍型组装。ParticleHap的C代码实现将可从https://sites.google.com/site/asynoeun/particlehap下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37a9/4503296/5f7769f637f4/12859_2015_651_Fig1_HTML.jpg

相似文献

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Decoding Genetic Variations: Communications-Inspired Haplotype Assembly.

IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):518-30. doi: 10.1109/TCBB.2015.2462367.

Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.

Bioinformatics. 2013 Oct 1;29(19):2427-34. doi: 10.1093/bioinformatics/btt418. Epub 2013 Aug 13.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

MixSIH: a mixture model for single individual haplotyping.

BMC Genomics. 2013;14 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-14-S2-S5. Epub 2013 Feb 15.

Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data.

Bioinformatics. 2018 Jun 15;34(12):2012-2018. doi: 10.1093/bioinformatics/bty059.

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

J Comput Biol. 2012 Jun;19(6):577-90. doi: 10.1089/cmb.2012.0084.

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.

Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329.

HaploMaker: An improved algorithm for rapid haplotype assembly of genomic sequences.

Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac038.

引用本文的文献

Pairwise comparative analysis of six haplotype assembly methods based on users' experience.

BMC Genom Data. 2023 Jun 29;24(1):35. doi: 10.1186/s12863-023-01134-5.

Better ILP models for haplotype assembly.

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):52. doi: 10.1186/s12859-018-2012-x.

本文引用的文献

HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction.

ISRN Bioinform. 2013 Jan 28;2013:291741. doi: 10.1155/2013/291741. eCollection 2013.

Haplotype assembly in polyploid genomes and identical by descent shared tracts.

Bioinformatics. 2013 Jul 1;29(13):i352-60. doi: 10.1093/bioinformatics/btt213.

Exact algorithms for haplotype assembly from whole-genome sequence data.

Bioinformatics. 2013 Aug 15;29(16):1938-45. doi: 10.1093/bioinformatics/btt349. Epub 2013 Jun 18.

MixSIH: a mixture model for single individual haplotyping.

BMC Genomics. 2013;14 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-14-S2-S5. Epub 2013 Feb 15.

A highly accurate heuristic algorithm for the haplotype assembly problem.

BMC Genomics. 2013;14 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-14-S2-S2. Epub 2013 Feb 15.

A fast and accurate algorithm for single individual haplotyping.

BMC Syst Biol. 2012;6 Suppl 2(Suppl 2):S8. doi: 10.1186/1752-0509-6-S2-S8. Epub 2012 Dec 12.

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

J Comput Biol. 2012 Jun;19(6):577-90. doi: 10.1089/cmb.2012.0084.

Genotype and SNP calling from next-generation sequencing data.

Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.

A map of human genome variation from population-scale sequencing.

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem.

Bioinformatics. 2010 Sep 15;26(18):2217-25. doi: 10.1093/bioinformatics/btq411. Epub 2010 Jul 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型

Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

作者信息

Ahn Soyeon, Vikalo Haris

机构信息

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, 78712, Texas, USA.