从全基因组序列数据中进行单倍型组装的最优算法。

Optimal algorithms for haplotype assembly from whole-genome sequence data.

机构信息

Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.

DOI:10.1093/bioinformatics/btq215

PMID:20529904

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881399/

Abstract

MOTIVATION

Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of 'haplotype assembly' is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem.

RESULTS

In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m x 2(k) x n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants.

摘要

动机

单倍型推断是人类基因组中遗传变异分析的许多类型的重要步骤。传统的方法来获得单倍型涉及从个体的人群中收集基因型信息，然后应用单倍型推断算法。高通量测序技术的发展允许通过组合序列片段获得单倍型的替代策略。“单倍型组装”问题是给定此类片段（或读取）的集合以及它们在由将读取映射到参考基因组而预先确定的单倍型中的位置，组装染色体的两个单倍型的问题。读取中的错误大大增加了问题的难度，并且已经表明，即使对于长度为 2 的读取，该问题也是 NP 难的。现有的贪婪和随机算法不能保证找到单倍型组装问题的最优解。

结果

在本文中，我们提出了一种动态规划算法，该算法能够以时间复杂度 O(m x 2(k) x n)最优地组装单倍型，其中 m 是读取的数量，k 是最长读取的长度，n 是单倍型中 SNP 的总数。我们还将单倍型组装问题简化为最大可满足性问题，即使 k 很大，通常也可以最优地解决该问题。利用我们算法的效率，我们进行了模拟实验，证明使用当前测序技术的典型长度的读取进行单倍型组装是不切实际的。但是，我们证明了这种方法与传统的单倍型相位方法相结合，使我们能够实际构建包含常见和罕见变体的单倍型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/ed90a7749ae1/btq215f1.jpg

相似文献

Optimal algorithms for haplotype assembly from whole-genome sequence data.

Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.

Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.

J Comput Biol. 2013 Feb;20(2):80-92. doi: 10.1089/cmb.2012.0091.

Exact algorithms for haplotype assembly from whole-genome sequence data.

Bioinformatics. 2013 Aug 15;29(16):1938-45. doi: 10.1093/bioinformatics/btt349. Epub 2013 Jun 18.

Efficient algorithms for polyploid haplotype phasing.

BMC Genomics. 2018 May 9;19(Suppl 2):110. doi: 10.1186/s12864-018-4464-9.

HaploMaker: An improved algorithm for rapid haplotype assembly of genomic sequences.

Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac038.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.

Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

Hap-seqX: expedite algorithm for haplotype phasing with imputation using sequence data.

Gene. 2013 Apr 10;518(1):2-6. doi: 10.1016/j.gene.2012.11.093. Epub 2012 Dec 23.

Haplotype phasing in single-cell DNA-sequencing data.

Bioinformatics. 2018 Jul 1;34(13):i211-i217. doi: 10.1093/bioinformatics/bty286.

引用本文的文献

Mini-scale traffic flow optimization: an iterative QUBOs approach converting from hybrid solver to pure quantum processing unit.

Sci Rep. 2025 Jul 2;15(1):22904. doi: 10.1038/s41598-025-04568-2.

Second-Generation Digital Health Platforms: Placing the Patient at the Center and Focusing on Clinical Outcomes.

Front Digit Health. 2020 Dec 3;2:569178. doi: 10.3389/fdgth.2020.569178. eCollection 2020.

Improving Global Healthcare and Reducing Costs Using Second-Generation Artificial Intelligence-Based Digital Pills: A Market Disruptor.

Int J Environ Res Public Health. 2021 Jan 19;18(2):811. doi: 10.3390/ijerph18020811.

ComHapDet: a spatial community detection algorithm for haplotype assembly.

BMC Genomics. 2020 Sep 9;21(Suppl 9):586. doi: 10.1186/s12864-020-06935-x.

Minimum error correction-based haplotype assembly: Considerations for long read data.

PLoS One. 2020 Jun 12;15(6):e0234470. doi: 10.1371/journal.pone.0234470. eCollection 2020.

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project.

Front Genet. 2020 Feb 21;10:1407. doi: 10.3389/fgene.2019.01407. eCollection 2019.

Artificial intelligence for precision medicine in neurodevelopmental disorders.

NPJ Digit Med. 2019 Nov 21;2:112. doi: 10.1038/s41746-019-0191-0. eCollection 2019.

PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.

Genome Res. 2019 Nov;29(11):1860-1877. doi: 10.1101/gr.234435.118. Epub 2019 Oct 18.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

HAHap: a read-based haplotyping method using hierarchical assembly.

PeerJ. 2018 Oct 30;6:e5852. doi: 10.7717/peerj.5852. eCollection 2018.

本文引用的文献

Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.

HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.

Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.

An MCMC algorithm for haplotype assembly from whole-genome sequence data.

Genome Res. 2008 Aug;18(8):1336-46. doi: 10.1101/gr.077065.108.

The complete genome of an individual by massively parallel DNA sequencing.

Nature. 2008 Apr 17;452(7189):872-6. doi: 10.1038/nature06884.

Haplotypic analysis of Wellcome Trust Case Control Consortium data.

Hum Genet. 2008 Apr;123(3):273-80. doi: 10.1007/s00439-008-0472-1. Epub 2008 Jan 26.

A second generation human haplotype map of over 3.1 million SNPs.

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

The diploid genome sequence of an individual human.

PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.

A new multipoint method for genome-wide association studies by imputation of genotypes.

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

Haplotype reconstruction from SNP fragments by minimum error correction.

Bioinformatics. 2005 May 15;21(10):2456-62. doi: 10.1093/bioinformatics/bti352. Epub 2005 Feb 24.

Haplotype reconstruction from genotype data using Imperfect Phylogeny.

Bioinformatics. 2004 Aug 12;20(12):1842-9. doi: 10.1093/bioinformatics/bth149. Epub 2004 Feb 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从全基因组序列数据中进行单倍型组装的最优算法。

Optimal algorithms for haplotype assembly from whole-genome sequence data.

机构信息

Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.

DOI:10.1093/bioinformatics/btq215

PMID:20529904

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881399/

Abstract

MOTIVATION

RESULTS

摘要

从全基因组序列数据中进行单倍型组装的最优算法。

Optimal algorithms for haplotype assembly from whole-genome sequence data.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

从全基因组序列数据中进行单倍型组装的最优算法。

Optimal algorithms for haplotype assembly from whole-genome sequence data.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献