• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PWHATSHAP:用于下一代测序的高效单倍型分型

PWHATSHAP: efficient haplotyping for future generation sequencing.

作者信息

Bracciali Andrea, Aldinucci Marco, Patterson Murray, Marschall Tobias, Pisanti Nadia, Merelli Ivan, Torquati Massimo

机构信息

Computer Science and Mathematics, School of Natural Sciences, Stirling University, Stirling, FK9 4LA, UK.

Department of Computer Science, University of Torino, Torino, Italy.

出版信息

BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.

DOI:10.1186/s12859-016-1170-y
PMID:28185544
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5046197/
Abstract

BACKGROUND

Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.

RESULTS

Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage.

CONCLUSIONS

Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.

摘要

背景

单倍型定相是基因组学信息分析中的一个重要问题。给定一个个体的一组DNA片段,它包括确定每个片段来自哪一个可能的等位基因(基因的替代形式)。单倍型信息与基因调控、表观遗传学、全基因组关联研究、进化和群体研究以及突变研究相关。目前,单倍型分型被视为一个优化问题,旨在找到例如能使纠错成本最小化的解决方案,其中成本是对从DNA测序获得的信息准确性的置信度的一种度量。解决方案通常具有指数级的计算复杂度。WHATSHAP是一种最新的优化方法,它将计算复杂度从DNA片段长度转移到片段重叠,即覆盖度,因此在考虑当前产生更长片段的测序技术趋势时特别受关注。

结果

鉴于高效单倍型分型在多个分析流程中的潜在相关性,我们设计并构建了PWWHATSHAP,它是WHATSHAP的并行高性能版本。PWWHATSHAP嵌入在一个用Python开发的工具包中,并支持标准文件格式的基因组学数据集。基于WHATSHAP,PWWHATSHAP在探索一系列可能的解决方案时表现出相同的复杂度,该复杂度在数据集的覆盖度方面是指数级的。在多核架构上的并行实现使得单倍型分型的执行时间大幅减少,同时所提供的结果与WHATSHAP提供的结果具有相同的高精度,且精度随覆盖度增加。

结论

由于其对大型数据集的结构和管理,WHATSHAP的并行化带来了严峻的技术挑战,我们利用一个高级并行编程框架解决了这些挑战。结果,PWWHATSHAP成为一个免费可用的工具包,提高了基因组学信息分析的效率。

相似文献

1
PWHATSHAP: efficient haplotyping for future generation sequencing.PWHATSHAP:用于下一代测序的高效单倍型分型
BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.
2
WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap:用于下一代测序读数的加权单倍型组装
J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.
3
A Fosmid Pool-Based Next Generation Sequencing Approach to Haplotype-Resolve Whole Genomes.一种基于Fosmid文库的新一代测序方法用于单倍型解析全基因组。
Methods Mol Biol. 2017;1551:223-269. doi: 10.1007/978-1-4939-6750-6_13.
4
Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.利用下一代测序技术解决多倍体中的单体型分析难题:一项模拟研究。
Brief Bioinform. 2018 May 1;19(3):387-403. doi: 10.1093/bib/bbw126.
5
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
6
Haplotype estimation for biobank-scale data sets.生物样本库规模数据集的单倍型估计
Nat Genet. 2016 Jul;48(7):817-20. doi: 10.1038/ng.3583. Epub 2016 Jun 6.
7
A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings.一种在临床环境中直接进行单体型定相的长读测序方法。
Int J Mol Sci. 2020 Dec 1;21(23):9177. doi: 10.3390/ijms21239177.
8
PEATH: single-individual haplotyping by a probabilistic evolutionary algorithm with toggling.PEATH:基于具有切换功能的概率进化算法的单个体系单倍型分型
Bioinformatics. 2018 Jun 1;34(11):1801-1807. doi: 10.1093/bioinformatics/bty012.
9
Long Fragment Read (LFR) Technology: Cost-Effective, High-Quality Genome-Wide Molecular Haplotyping.长片段读取(LFR)技术:具有成本效益的高质量全基因组分子单倍型分型
Methods Mol Biol. 2017;1551:191-205. doi: 10.1007/978-1-4939-6750-6_11.
10
A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。
Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

引用本文的文献

1
A chaotic viewpoint-based approach to solve haplotype assembly using hypergraph model.基于混沌观点的超图模型方法解决单体型组装问题。
PLoS One. 2020 Oct 29;15(10):e0241291. doi: 10.1371/journal.pone.0241291. eCollection 2020.
2
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
3
HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

本文引用的文献

1
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.使用UK10K单倍型参考面板改进低频和罕见变异的填充。
Nat Commun. 2015 Sep 14;6:8111. doi: 10.1038/ncomms9111.
2
HapCol: accurate and memory-efficient haplotype assembly from long reads.HapCol:从长读段中进行准确且内存高效的单倍型组装。
Bioinformatics. 2016 Jun 1;32(11):1610-7. doi: 10.1093/bioinformatics/btv495. Epub 2015 Aug 26.
3
Integrative analysis of haplotype-resolved epigenomes across human tissues.人类组织中单体型分辨率表观基因组的综合分析。
HapCHAT:高效利用长读长覆盖度的自适应单倍型组装
BMC Bioinformatics. 2018 Jul 3;19(1):252. doi: 10.1186/s12859-018-2253-8.
Nature. 2015 Feb 19;518(7539):350-354. doi: 10.1038/nature14217.
4
WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap:用于下一代测序读数的加权单倍型组装
J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.
5
Whole-genome haplotyping approaches and genomic medicine.全基因组单倍型分析方法与基因组医学
Genome Med. 2014 Sep 25;6(9):73. doi: 10.1186/s13073-014-0073-7. eCollection 2014.
6
A first look at the Oxford Nanopore MinION sequencer.初窥牛津纳米孔MinION测序仪。
Mol Ecol Resour. 2014 Nov;14(6):1097-102. doi: 10.1111/1755-0998.12324. Epub 2014 Sep 24.
7
Probabilistic single-individual haplotyping.概率性单个体单倍型分型
Bioinformatics. 2014 Sep 1;30(17):i379-85. doi: 10.1093/bioinformatics/btu484.
8
Sequence alignment tools: one parallel pattern to rule them all?序列比对工具:一种统一的并行模式?
Biomed Res Int. 2014;2014:539410. doi: 10.1155/2014/539410. Epub 2014 Jul 24.
9
Whole-genome sequence variation, population structure and demographic history of the Dutch population.荷兰人群的全基因组序列变异、种群结构和人口历史。
Nat Genet. 2014 Aug;46(8):818-25. doi: 10.1038/ng.3021. Epub 2014 Jun 29.
10
Whole-genome haplotyping using long reads and statistical methods.使用长读段和统计方法进行全基因组单倍型分型。
Nat Biotechnol. 2014 Mar;32(3):261-266. doi: 10.1038/nbt.2833. Epub 2014 Feb 23.