HapCHAT：高效利用长读长覆盖度的自适应单倍型组装

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

机构信息

Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy.

Department of Computer Science, Princeton University, Princeton, New Jersey, USA.

出版信息

BMC Bioinformatics. 2018 Jul 3;19(1):252. doi: 10.1186/s12859-018-2253-8.

DOI:10.1186/s12859-018-2253-8

PMID:29970002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6029272/

Abstract

BACKGROUND

Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages.

RESULTS

Here, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes.

CONCLUSIONS

Our method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result.

AVAILABILITY

HapCHAT is available at http://hapchat.algolab.eu under the GNU Public License (GPL).

摘要

背景

单倍型组装是将测序读取所覆盖的变体的不同等位基因分配给人类个体基因组的两个单倍型的过程。长读长现在比以往任何时候都更便宜、更广泛地生产，并且已经被用于减少组装单倍型的碎片化，因为它们能够沿着基因组跨越几个变体。这些长读长也具有高错误率的特点，然而，随着读取集的增大，当这种错误率在基因组位置上均匀分布时，可以减轻这个问题。不幸的是，目前专门为长读长设计的最先进的动态规划方法只处理有限的覆盖范围。

结果

在这里，我们提出了一种新的单倍型组装方法，它结合并扩展了以前的方法的特征，以处理长读长和更高的覆盖范围。特别是，我们的算法能够动态地自适应估计每个变体位置的错误数量，同时最小化找到可行解决方案所需的总错误校正数量。这使得我们的方法能够显著减少所需的计算资源，从而可以考虑由更高覆盖率组成的数据集。该算法已在一个免费提供的工具 HapCHAT 中实现：通过自适应阈值处理覆盖范围的单倍型组装。对高达 60×覆盖范围的测序读取的实验分析表明，考虑更高的覆盖范围可以在更短的运行时间内提高准确性和召回率。

结论

我们的方法利用了测序读取的长程信息，使得能够获得在更少的未相位单倍型块中碎片化的组装单倍型。同时，我们的方法还能够处理更高的覆盖范围，以更好地纠正原始读取中的错误，并因此获得更准确的单倍型。

可用性

HapCHAT 可在 http://hapchat.algolab.eu 上获得，根据 GNU 公共许可证 (GPL) 提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feb8/6029272/cea4db645c87/12859_2018_2253_Fig1_HTML.jpg

相似文献

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.HapCHAT：高效利用长读长覆盖度的自适应单倍型组装

BMC Bioinformatics. 2018 Jul 3;19(1):252. doi: 10.1186/s12859-018-2253-8.

HapCol: accurate and memory-efficient haplotype assembly from long reads.HapCol：从长读段中进行准确且内存高效的单倍型组装。

Bioinformatics. 2016 Jun 1;32(11):1610-7. doi: 10.1093/bioinformatics/btv495. Epub 2015 Aug 26.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap：一种基于遗传算法的新型单倍型组装计算方法。

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap：用于下一代测序读数的加权单倍型组装

J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.

QuorUM: An Error Corrector for Illumina Reads.QuorUM：Illumina测序读数的纠错工具

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Exact algorithms for haplotype assembly from whole-genome sequence data.全基因组序列数据中单体型组装的精确算法。

Bioinformatics. 2013 Aug 15;29(16):1938-45. doi: 10.1093/bioinformatics/btt349. Epub 2013 Jun 18.

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.关于二倍体和多倍体基因组中单倍型组装的最小错误校正问题

J Comput Biol. 2016 Sep;23(9):718-36. doi: 10.1089/cmb.2015.0220. Epub 2016 Jun 9.

Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.Hap-seq：一种利用测序数据进行单倍型定相及插补的优化算法。

J Comput Biol. 2013 Feb;20(2):80-92. doi: 10.1089/cmb.2012.0091.

Minimum error correction-based haplotype assembly: Considerations for long read data.基于最小错误校正的单倍型组装：长读数据的考虑因素。

PLoS One. 2020 Jun 12;15(6):e0234470. doi: 10.1371/journal.pone.0234470. eCollection 2020.

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.HapCUT2：适用于多种测序技术的强大且准确的单倍型组装工具。

Genome Res. 2017 May;27(5):801-812. doi: 10.1101/gr.213462.116. Epub 2016 Dec 9.

引用本文的文献

Pairwise comparative analysis of six haplotype assembly methods based on users' experience.基于用户体验的六种单倍型组装方法的两两比较分析。

BMC Genom Data. 2023 Jun 29;24(1):35. doi: 10.1186/s12863-023-01134-5.

Benchmarking machine learning robustness in Covid-19 genome sequence classification.在新冠病毒基因组序列分类中对机器学习鲁棒性进行基准测试。

Sci Rep. 2023 Mar 13;13(1):4154. doi: 10.1038/s41598-023-31368-3.

GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data.GAMIBHEAR：基于基因组结构图谱数据进行全基因组单倍型重建。

Bioinformatics. 2021 Oct 11;37(19):3128-3135. doi: 10.1093/bioinformatics/btab238.

scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data.scHaplotyper：使用单细胞 DNA 测序数据进行遗传诊断的单体型构建和可视化。

BMC Bioinformatics. 2020 Feb 1;21(1):41. doi: 10.1186/s12859-020-3381-5.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap：一种基于遗传算法的新型单倍型组装计算方法。

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

本文引用的文献

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.穿透暗物质：长程测序和图谱的生物信息学。

Nat Rev Genet. 2018 Jun;19(6):329-346. doi: 10.1038/s41576-018-0003-4.

Mapping and phasing of structural variation in patient genomes using nanopore sequencing.使用纳米孔测序对患者基因组中的结构变异进行图谱绘制和相位分析。

Nat Commun. 2017 Nov 6;8(1):1326. doi: 10.1038/s41467-017-01343-4.

Dense and accurate whole-chromosome haplotyping of individual genomes.个体基因组的密集且精确的全染色体单倍型分型。

Nat Commun. 2017 Nov 3;8(1):1293. doi: 10.1038/s41467-017-01389-4.

PWHATSHAP: efficient haplotyping for future generation sequencing.PWHATSHAP：用于下一代测序的高效单倍型分型

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.HapCUT2：适用于多种测序技术的强大且准确的单倍型组装工具。

Genome Res. 2017 May;27(5):801-812. doi: 10.1101/gr.213462.116. Epub 2016 Dec 9.

The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community.牛津纳米孔MinION测序仪：将纳米孔测序技术带给基因组学界。

Genome Biol. 2016 Nov 25;17(1):239. doi: 10.1186/s13059-016-1103-0.

Reference-based phasing using the Haplotype Reference Consortium panel.使用单倍型参考联盟面板进行基于参考的定相

Nat Genet. 2016 Nov;48(11):1443-1448. doi: 10.1038/ng.3679. Epub 2016 Oct 3.

Direct chromosome-length haplotyping by single-cell sequencing.通过单细胞测序进行直接染色体长度单倍型分型。

Genome Res. 2016 Nov;26(11):1565-1574. doi: 10.1101/gr.209841.116. Epub 2016 Sep 19.

Development and Characterization of Reference Materials for Genetic Testing: Focus on Public Partnerships.基因检测参考物质的开发与特性鉴定：聚焦公共合作关系

Ann Lab Med. 2016 Nov;36(6):513-20. doi: 10.3343/alm.2016.36.6.513.

Read-based phasing of related individuals.相关个体的基于读取的定相分析。

Bioinformatics. 2016 Jun 15;32(12):i234-i242. doi: 10.1093/bioinformatics/btw276.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

HapCHAT：高效利用长读长覆盖度的自适应单倍型组装

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献