• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从全基因组序列数据中进行单倍型组装的最优算法。

Optimal algorithms for haplotype assembly from whole-genome sequence data.

机构信息

Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.

DOI:10.1093/bioinformatics/btq215
PMID:20529904
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881399/
Abstract

MOTIVATION

Haplotype inference is an important step for many types of analyses of genetic variation in the human genome. Traditional approaches for obtaining haplotypes involve collecting genotype information from a population of individuals and then applying a haplotype inference algorithm. The development of high-throughput sequencing technologies allows for an alternative strategy to obtain haplotypes by combining sequence fragments. The problem of 'haplotype assembly' is the problem of assembling the two haplotypes for a chromosome given the collection of such fragments, or reads, and their locations in the haplotypes, which are pre-determined by mapping the reads to a reference genome. Errors in reads significantly increase the difficulty of the problem and it has been shown that the problem is NP-hard even for reads of length 2. Existing greedy and stochastic algorithms are not guaranteed to find the optimal solutions for the haplotype assembly problem.

RESULTS

In this article, we proposed a dynamic programming algorithm that is able to assemble the haplotypes optimally with time complexity O(m x 2(k) x n), where m is the number of reads, k is the length of the longest read and n is the total number of SNPs in the haplotypes. We also reduce the haplotype assembly problem into the maximum satisfiability problem that can often be solved optimally even when k is large. Taking advantage of the efficiency of our algorithm, we perform simulation experiments demonstrating that the assembly of haplotypes using reads of length typical of the current sequencing technologies is not practical. However, we demonstrate that the combination of this approach and the traditional haplotype phasing approaches allow us to practically construct haplotypes containing both common and rare variants.

摘要

动机

单倍型推断是人类基因组中遗传变异分析的许多类型的重要步骤。传统的方法来获得单倍型涉及从个体的人群中收集基因型信息,然后应用单倍型推断算法。高通量测序技术的发展允许通过组合序列片段获得单倍型的替代策略。“单倍型组装”问题是给定此类片段(或读取)的集合以及它们在由将读取映射到参考基因组而预先确定的单倍型中的位置,组装染色体的两个单倍型的问题。读取中的错误大大增加了问题的难度,并且已经表明,即使对于长度为 2 的读取,该问题也是 NP 难的。现有的贪婪和随机算法不能保证找到单倍型组装问题的最优解。

结果

在本文中,我们提出了一种动态规划算法,该算法能够以时间复杂度 O(m x 2(k) x n)最优地组装单倍型,其中 m 是读取的数量,k 是最长读取的长度,n 是单倍型中 SNP 的总数。我们还将单倍型组装问题简化为最大可满足性问题,即使 k 很大,通常也可以最优地解决该问题。利用我们算法的效率,我们进行了模拟实验,证明使用当前测序技术的典型长度的读取进行单倍型组装是不切实际的。但是,我们证明了这种方法与传统的单倍型相位方法相结合,使我们能够实际构建包含常见和罕见变体的单倍型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/526addb06f55/btq215f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/ed90a7749ae1/btq215f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/526addb06f55/btq215f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/ed90a7749ae1/btq215f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7ad/2881399/526addb06f55/btq215f2.jpg

相似文献

1
Optimal algorithms for haplotype assembly from whole-genome sequence data.从全基因组序列数据中进行单倍型组装的最优算法。
Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.
2
Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.Hap-seq:一种利用测序数据进行单倍型定相及插补的优化算法。
J Comput Biol. 2013 Feb;20(2):80-92. doi: 10.1089/cmb.2012.0091.
3
Exact algorithms for haplotype assembly from whole-genome sequence data.全基因组序列数据中单体型组装的精确算法。
Bioinformatics. 2013 Aug 15;29(16):1938-45. doi: 10.1093/bioinformatics/btt349. Epub 2013 Jun 18.
4
Efficient algorithms for polyploid haplotype phasing.高效的多倍体单体型相位算法。
BMC Genomics. 2018 May 9;19(Suppl 2):110. doi: 10.1186/s12864-018-4464-9.
5
HaploMaker: An improved algorithm for rapid haplotype assembly of genomic sequences.HaploMaker:一种用于快速组装基因组序列单倍型的改进算法。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac038.
6
Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.利用跨越多个单核苷酸多态性的读取信息,从测序数据中推断单倍型。
Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.
7
Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.基于读取和基于群体的相位整合,实现个体基因组的密集和精确单倍型分型。
Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329.
8
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
9
Hap-seqX: expedite algorithm for haplotype phasing with imputation using sequence data.Hap-seqX:使用序列数据进行导入的单倍型相位加速算法。
Gene. 2013 Apr 10;518(1):2-6. doi: 10.1016/j.gene.2012.11.093. Epub 2012 Dec 23.
10
Haplotype phasing in single-cell DNA-sequencing data.单细胞 DNA 测序数据中的单倍型相位。
Bioinformatics. 2018 Jul 1;34(13):i211-i217. doi: 10.1093/bioinformatics/bty286.

引用本文的文献

1
Mini-scale traffic flow optimization: an iterative QUBOs approach converting from hybrid solver to pure quantum processing unit.微型交通流优化:一种从混合求解器转换为纯量子处理单元的迭代二次无约束二进制优化方法。
Sci Rep. 2025 Jul 2;15(1):22904. doi: 10.1038/s41598-025-04568-2.
2
Second-Generation Digital Health Platforms: Placing the Patient at the Center and Focusing on Clinical Outcomes.第二代数字健康平台:以患者为中心并聚焦临床结果。
Front Digit Health. 2020 Dec 3;2:569178. doi: 10.3389/fdgth.2020.569178. eCollection 2020.
3
Improving Global Healthcare and Reducing Costs Using Second-Generation Artificial Intelligence-Based Digital Pills: A Market Disruptor.

本文引用的文献

1
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
2
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.HapCUT:一种用于单倍型组装问题的高效且准确的算法。
Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.
3
An MCMC algorithm for haplotype assembly from whole-genome sequence data.一种用于从全基因组序列数据中进行单倍型组装的马尔可夫链蒙特卡罗算法。
利用第二代基于人工智能的数字药丸改善全球医疗保健并降低成本:市场颠覆者。
Int J Environ Res Public Health. 2021 Jan 19;18(2):811. doi: 10.3390/ijerph18020811.
4
ComHapDet: a spatial community detection algorithm for haplotype assembly.ComHapDet:一种用于单倍型组装的空间群落检测算法。
BMC Genomics. 2020 Sep 9;21(Suppl 9):586. doi: 10.1186/s12864-020-06935-x.
5
Minimum error correction-based haplotype assembly: Considerations for long read data.基于最小错误校正的单倍型组装:长读数据的考虑因素。
PLoS One. 2020 Jun 12;15(6):e0234470. doi: 10.1371/journal.pone.0234470. eCollection 2020.
6
A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project.开展系统发育基因组目标序列捕获项目指南。
Front Genet. 2020 Feb 21;10:1407. doi: 10.3389/fgene.2019.01407. eCollection 2019.
7
Artificial intelligence for precision medicine in neurodevelopmental disorders.用于神经发育障碍精准医学的人工智能
NPJ Digit Med. 2019 Nov 21;2:112. doi: 10.1038/s41746-019-0191-0. eCollection 2019.
8
PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.PhISCS:一种通过单细胞和批量测序数据的综合使用来重建亚完美肿瘤系统发育的组合方法。
Genome Res. 2019 Nov;29(11):1860-1877. doi: 10.1101/gr.234435.118. Epub 2019 Oct 18.
9
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
10
HAHap: a read-based haplotyping method using hierarchical assembly.HAHap:一种使用分层组装的基于 reads 的单倍型分型方法。
PeerJ. 2018 Oct 30;6:e5852. doi: 10.7717/peerj.5852. eCollection 2018.
Genome Res. 2008 Aug;18(8):1336-46. doi: 10.1101/gr.077065.108.
4
The complete genome of an individual by massively parallel DNA sequencing.通过大规模平行DNA测序获得个体的完整基因组。
Nature. 2008 Apr 17;452(7189):872-6. doi: 10.1038/nature06884.
5
Haplotypic analysis of Wellcome Trust Case Control Consortium data.威康信托病例对照研究联盟数据的单倍型分析。
Hum Genet. 2008 Apr;123(3):273-80. doi: 10.1007/s00439-008-0472-1. Epub 2008 Jan 26.
6
A second generation human haplotype map of over 3.1 million SNPs.一张包含超过310万个单核苷酸多态性的第二代人类单倍型图谱。
Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.
7
The diploid genome sequence of an individual human.某个人类个体的二倍体基因组序列。
PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.
8
A new multipoint method for genome-wide association studies by imputation of genotypes.一种通过基因型插补进行全基因组关联研究的新的多点方法。
Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.
9
Haplotype reconstruction from SNP fragments by minimum error correction.通过最小错误校正从单核苷酸多态性(SNP)片段进行单倍型重建。
Bioinformatics. 2005 May 15;21(10):2456-62. doi: 10.1093/bioinformatics/bti352. Epub 2005 Feb 24.
10
Haplotype reconstruction from genotype data using Imperfect Phylogeny.利用不完美系统发育从基因型数据中进行单倍型重建。
Bioinformatics. 2004 Aug 12;20(12):1842-9. doi: 10.1093/bioinformatics/bth149. Epub 2004 Feb 26.