PWHATSHAP：用于下一代测序的高效单倍型分型

PWHATSHAP: efficient haplotyping for future generation sequencing.

作者信息

Bracciali Andrea, Aldinucci Marco, Patterson Murray, Marschall Tobias, Pisanti Nadia, Merelli Ivan, Torquati Massimo

机构信息

Computer Science and Mathematics, School of Natural Sciences, Stirling University, Stirling, FK9 4LA, UK.

Department of Computer Science, University of Torino, Torino, Italy.

出版信息

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.

DOI:10.1186/s12859-016-1170-y

PMID:28185544

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5046197/

Abstract

BACKGROUND

Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.

RESULTS

Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage.

CONCLUSIONS

Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.

摘要

背景

单倍型定相是基因组学信息分析中的一个重要问题。给定一个个体的一组DNA片段，它包括确定每个片段来自哪一个可能的等位基因（基因的替代形式）。单倍型信息与基因调控、表观遗传学、全基因组关联研究、进化和群体研究以及突变研究相关。目前，单倍型分型被视为一个优化问题，旨在找到例如能使纠错成本最小化的解决方案，其中成本是对从DNA测序获得的信息准确性的置信度的一种度量。解决方案通常具有指数级的计算复杂度。WHATSHAP是一种最新的优化方法，它将计算复杂度从DNA片段长度转移到片段重叠，即覆盖度，因此在考虑当前产生更长片段的测序技术趋势时特别受关注。

结果

鉴于高效单倍型分型在多个分析流程中的潜在相关性，我们设计并构建了PWWHATSHAP，它是WHATSHAP的并行高性能版本。PWWHATSHAP嵌入在一个用Python开发的工具包中，并支持标准文件格式的基因组学数据集。基于WHATSHAP，PWWHATSHAP在探索一系列可能的解决方案时表现出相同的复杂度，该复杂度在数据集的覆盖度方面是指数级的。在多核架构上的并行实现使得单倍型分型的执行时间大幅减少，同时所提供的结果与WHATSHAP提供的结果具有相同的高精度，且精度随覆盖度增加。

结论

由于其对大型数据集的结构和管理，WHATSHAP的并行化带来了严峻的技术挑战，我们利用一个高级并行编程框架解决了这些挑战。结果，PWWHATSHAP成为一个免费可用的工具包，提高了基因组学信息分析的效率。

相似文献

PWHATSHAP: efficient haplotyping for future generation sequencing.

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.

J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.

A Fosmid Pool-Based Next Generation Sequencing Approach to Haplotype-Resolve Whole Genomes.

Methods Mol Biol. 2017;1551:223-269. doi: 10.1007/978-1-4939-6750-6_13.

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Brief Bioinform. 2018 May 1;19(3):387-403. doi: 10.1093/bib/bbw126.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

Haplotype estimation for biobank-scale data sets.

Nat Genet. 2016 Jul;48(7):817-20. doi: 10.1038/ng.3583. Epub 2016 Jun 6.

A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings.

Int J Mol Sci. 2020 Dec 1;21(23):9177. doi: 10.3390/ijms21239177.

PEATH: single-individual haplotyping by a probabilistic evolutionary algorithm with toggling.

Bioinformatics. 2018 Jun 1;34(11):1801-1807. doi: 10.1093/bioinformatics/bty012.

Long Fragment Read (LFR) Technology: Cost-Effective, High-Quality Genome-Wide Molecular Haplotyping.

Methods Mol Biol. 2017;1551:191-205. doi: 10.1007/978-1-4939-6750-6_11.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

引用本文的文献

A chaotic viewpoint-based approach to solve haplotype assembly using hypergraph model.

PLoS One. 2020 Oct 29;15(10):e0241291. doi: 10.1371/journal.pone.0241291. eCollection 2020.

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

BMC Bioinformatics. 2018 Jul 3;19(1):252. doi: 10.1186/s12859-018-2253-8.

本文引用的文献

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

Nat Commun. 2015 Sep 14;6:8111. doi: 10.1038/ncomms9111.

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Bioinformatics. 2016 Jun 1;32(11):1610-7. doi: 10.1093/bioinformatics/btv495. Epub 2015 Aug 26.

Integrative analysis of haplotype-resolved epigenomes across human tissues.

Nature. 2015 Feb 19;518(7539):350-354. doi: 10.1038/nature14217.

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.

J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.

Whole-genome haplotyping approaches and genomic medicine.

Genome Med. 2014 Sep 25;6(9):73. doi: 10.1186/s13073-014-0073-7. eCollection 2014.

A first look at the Oxford Nanopore MinION sequencer.

Mol Ecol Resour. 2014 Nov;14(6):1097-102. doi: 10.1111/1755-0998.12324. Epub 2014 Sep 24.

Probabilistic single-individual haplotyping.

Bioinformatics. 2014 Sep 1;30(17):i379-85. doi: 10.1093/bioinformatics/btu484.

Sequence alignment tools: one parallel pattern to rule them all?

Biomed Res Int. 2014;2014:539410. doi: 10.1155/2014/539410. Epub 2014 Jul 24.

Whole-genome sequence variation, population structure and demographic history of the Dutch population.

Nat Genet. 2014 Aug;46(8):818-25. doi: 10.1038/ng.3021. Epub 2014 Jun 29.

Whole-genome haplotyping using long reads and statistical methods.

Nat Biotechnol. 2014 Mar;32(3):261-266. doi: 10.1038/nbt.2833. Epub 2014 Feb 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PWHATSHAP：用于下一代测序的高效单倍型分型

PWHATSHAP: efficient haplotyping for future generation sequencing.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献