• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于二倍体和多倍体基因组中单倍型组装的最小错误校正问题

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.

作者信息

Bonizzoni Paola, Dondi Riccardo, Klau Gunnar W, Pirola Yuri, Pisanti Nadia, Zaccaria Simone

机构信息

1 Department of Computer Science (DISCO), University of Milano-Bicocca , Milan, Italy .

2 Department of Social and Human Sciences, University of Bergamo , Bergamo, Italy .

出版信息

J Comput Biol. 2016 Sep;23(9):718-36. doi: 10.1089/cmb.2015.0220. Epub 2016 Jun 9.

DOI:10.1089/cmb.2015.0220
PMID:27280382
Abstract

In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.

摘要

在二倍体基因组中,单倍型组装是一个计算问题,即从可能受测序错误影响的测序读段(称为片段)开始,重建每条染色体的两个亲本拷贝(称为单倍型)。最小错误校正(MEC)是单倍型组装中的一个突出计算问题,给定一组片段,其目标是通过应用最少数量的碱基校正来重建两个单倍型。MEC在计算上难以求解,但已证明一些基于近似或固定参数的方法能够在实际数据上获得准确结果。在这项工作中,我们从近似和固定参数可处理性的角度扩展了当前对MEC计算复杂性的刻画。具体而言,我们表明MEC在常数因子内不可近似,而在输入大小的对数因子内可近似。此外,我们回答了关于经典或实际感兴趣参数(校正总数和片段长度)的固定参数可处理性的开放性问题。此外,我们为该问题的一个变体提出了一种直接的2近似算法,该变体也已应用于聚类数据框架。最后,由于多倍体基因组,如植物和鱼类的基因组,由染色体的两个以上拷贝组成,我们引入了MEC的一种新形式,即k倍体MEC问题,它将传统问题扩展到处理多倍体基因组。我们表明新形式在计算上仍然既困难又难以近似。尽管如此,从参数化的角度来看我们证明该问题对于实际感兴趣的参数(如单倍型数量和覆盖度,或单倍型数量和片段长度)是可处理的。

相似文献

1
On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.关于二倍体和多倍体基因组中单倍型组装的最小错误校正问题
J Comput Biol. 2016 Sep;23(9):718-36. doi: 10.1089/cmb.2015.0220. Epub 2016 Jun 9.
2
Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.二倍体和多倍体单体型组装的稀疏张量分解。
BMC Genomics. 2018 Mar 21;19(Suppl 4):191. doi: 10.1186/s12864-018-4551-y.
3
SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming.SDhaP:通过半定规划进行二倍体和多倍体的单倍型组装。
BMC Genomics. 2015 Apr 3;16(1):260. doi: 10.1186/s12864-015-1408-5.
4
WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap:用于下一代测序读数的加权单倍型组装
J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.
5
H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids.H-PoP 和 H-PoPG:用于多倍体单个体单体型分析的启发式分区算法。
Bioinformatics. 2016 Dec 15;32(24):3735-3744. doi: 10.1093/bioinformatics/btw537. Epub 2016 Aug 16.
6
Haplotype reconstruction from SNP fragments by minimum error correction.通过最小错误校正从单核苷酸多态性(SNP)片段进行单倍型重建。
Bioinformatics. 2005 May 15;21(10):2456-62. doi: 10.1093/bioinformatics/bti352. Epub 2005 Feb 24.
7
Haplotype assembly in polyploid genomes and identical by descent shared tracts.多倍体基因组中的单体型组装和同源共享片段。
Bioinformatics. 2013 Jul 1;29(13):i352-60. doi: 10.1093/bioinformatics/btt213.
8
Hap10: reconstructing accurate and long polyploid haplotypes using linked reads.Hap10:利用连锁reads 重建准确和长的多倍体单倍型。
BMC Bioinformatics. 2020 Jun 18;21(1):253. doi: 10.1186/s12859-020-03584-5.
9
flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning.flopp:通过均匀树分区实现超快速长读多倍体单体型相位。
J Comput Biol. 2022 Feb;29(2):195-211. doi: 10.1089/cmb.2021.0436. Epub 2022 Jan 17.
10
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

引用本文的文献

1
ralphi: a deep reinforcement learning framework for haplotype assembly.拉尔菲:一种用于单倍型组装的深度强化学习框架。
bioRxiv. 2025 Feb 21:2025.02.17.638151. doi: 10.1101/2025.02.17.638151.
2
Pangenome comparison via ED strings.通过编辑距离字符串进行泛基因组比较。
Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024.
3
Floria: fast and accurate strain haplotyping in metagenomes.弗洛里亚:宏基因组中快速准确的菌株单倍型分型。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i30-i38. doi: 10.1093/bioinformatics/btae252.
4
Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing.利用量子计算进行二倍体和多倍体基因组的单倍型解析组装。
Cell Rep Methods. 2024 May 20;4(5):100754. doi: 10.1016/j.crmeth.2024.100754. Epub 2024 Apr 12.
5
Haplotype-resolved assembly of auto-polyploid genomes via combining Hi-C and gametic data.通过结合 Hi-C 和配子数据对自多倍体基因组进行单体型解析组装。
Sci Rep. 2024 Apr 3;14(1):7892. doi: 10.1038/s41598-024-58623-5.
6
XHap: haplotype assembly using long-distance read correlations learned by transformers.XHap:利用通过变压器学习的长距离读段相关性进行单倍型组装。
Bioinform Adv. 2023 Nov 23;3(1):vbad169. doi: 10.1093/bioadv/vbad169. eCollection 2023.
7
Recent Advances in Assembly of Complex Plant Genomes.复杂植物基因组组装的最新进展。
Genomics Proteomics Bioinformatics. 2023 Jun;21(3):427-439. doi: 10.1016/j.gpb.2023.04.004. Epub 2023 Apr 25.
8
Computational graph pangenomics: a tutorial on data structures and their applications.计算图泛基因组学:数据结构及其应用教程
Nat Comput. 2022 Mar;21(1):81-108. doi: 10.1007/s11047-022-09882-6. Epub 2022 Mar 4.
9
Karyon: a computational framework for the diagnosis of hybrids, aneuploids, and other nonstandard architectures in genome assemblies.卡戎:一个用于基因组组装中杂种、非整倍体和其他非标准结构的诊断的计算框架。
Gigascience. 2022 Oct 7;11. doi: 10.1093/gigascience/giac088.
10
flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning.flopp:通过均匀树分区实现超快速长读多倍体单体型相位。
J Comput Biol. 2022 Feb;29(2):195-211. doi: 10.1089/cmb.2021.0436. Epub 2022 Jan 17.