Suppr超能文献

PWHATSHAP:用于下一代测序的高效单倍型分型

PWHATSHAP: efficient haplotyping for future generation sequencing.

作者信息

Bracciali Andrea, Aldinucci Marco, Patterson Murray, Marschall Tobias, Pisanti Nadia, Merelli Ivan, Torquati Massimo

机构信息

Computer Science and Mathematics, School of Natural Sciences, Stirling University, Stirling, FK9 4LA, UK.

Department of Computer Science, University of Torino, Torino, Italy.

出版信息

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.

Abstract

BACKGROUND

Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.

RESULTS

Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage.

CONCLUSIONS

Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.

摘要

背景

单倍型定相是基因组学信息分析中的一个重要问题。给定一个个体的一组DNA片段,它包括确定每个片段来自哪一个可能的等位基因(基因的替代形式)。单倍型信息与基因调控、表观遗传学、全基因组关联研究、进化和群体研究以及突变研究相关。目前,单倍型分型被视为一个优化问题,旨在找到例如能使纠错成本最小化的解决方案,其中成本是对从DNA测序获得的信息准确性的置信度的一种度量。解决方案通常具有指数级的计算复杂度。WHATSHAP是一种最新的优化方法,它将计算复杂度从DNA片段长度转移到片段重叠,即覆盖度,因此在考虑当前产生更长片段的测序技术趋势时特别受关注。

结果

鉴于高效单倍型分型在多个分析流程中的潜在相关性,我们设计并构建了PWWHATSHAP,它是WHATSHAP的并行高性能版本。PWWHATSHAP嵌入在一个用Python开发的工具包中,并支持标准文件格式的基因组学数据集。基于WHATSHAP,PWWHATSHAP在探索一系列可能的解决方案时表现出相同的复杂度,该复杂度在数据集的覆盖度方面是指数级的。在多核架构上的并行实现使得单倍型分型的执行时间大幅减少,同时所提供的结果与WHATSHAP提供的结果具有相同的高精度,且精度随覆盖度增加。

结论

由于其对大型数据集的结构和管理,WHATSHAP的并行化带来了严峻的技术挑战,我们利用一个高级并行编程框架解决了这些挑战。结果,PWWHATSHAP成为一个免费可用的工具包,提高了基因组学信息分析的效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验