Suppr超能文献

HapCompass:一种用于准确组装序列数据单倍型的快速循环基算法。

HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

作者信息

Aguiar Derek, Istrail Sorin

机构信息

Department of Computer Science, Brown University, Providence RI 02912, USA.

出版信息

J Comput Biol. 2012 Jun;19(6):577-90. doi: 10.1089/cmb.2012.0084.

Abstract

Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational methods of determining haplotype phase from sequence data--known as haplotype assembly--have difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic considering modern high-throughput sequencing technologies. We present a novel algorithm, HapCompass, for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm operates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co-occurring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees. We define the minimum weighted edge removal optimization on this graph and develop an algorithm based on cycle basis local optimizations for resolving conflicting evidence. We then estimate the amount of sequencing required to produce a complete haplotype assembly of a chromosome. Using these estimates together with metrics borrowed from genome assembly and haplotype phasing, we compare the accuracy of HapCompass, the Genome Analysis ToolKit, and HapCut for 1000 Genomes Project and simulated data. We show that HapCompass performs significantly better for a variety of data and metrics. HapCompass is freely available for download (www.brown.edu/Research/Istrail_Lab/).

摘要

由于当前测序技术的局限性,基因组组装方法会产生单倍型相位不明确的组装结果。确定个体的单倍型相位在计算上具有挑战性,且实验成本高昂。然而,单倍型相位信息在许多生物信息学工作流程中至关重要,例如基因关联研究和基因组插补。目前从序列数据确定单倍型相位的计算方法——即所谓的单倍型组装——在处理大型(1000基因组类型)数据时难以产生准确结果,或者在受限的优化条件下运行,而考虑到现代高通量测序技术,这些优化条件并不现实。我们提出了一种名为HapCompass的新算法,用于对高密度测序的人类基因组数据进行单倍型组装。HapCompass算法在一个图上运行,其中单核苷酸多态性(SNP)为节点,边由序列读取定义,并被视为单倍型中共同出现的SNP等位基因的支持证据。在我们的图模型中,单倍型相位对应于生成树。我们在此图上定义了最小加权边移除优化,并开发了一种基于循环基局部优化的算法来解决冲突证据。然后,我们估计生成一条染色体的完整单倍型组装所需的测序量。利用这些估计值以及从基因组组装和单倍型相位借用的指标,我们比较了HapCompass、基因组分析工具包(Genome Analysis ToolKit)和HapCut在千人基因组计划和模拟数据上的准确性。我们表明,在各种数据和指标上,HapCompass的表现都显著更好。HapCompass可免费下载(www.brown.edu/Research/Istrail_Lab/)。

相似文献

8
Decoding Genetic Variations: Communications-Inspired Haplotype Assembly.解码基因变异:受通信启发的单倍型组装
IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):518-30. doi: 10.1109/TCBB.2015.2462367.

引用本文的文献

9
Reconstruction of evolving gene variants and fitness from short sequencing reads.从短测序读长重建演化基因变异体和适应度
Nat Chem Biol. 2021 Nov;17(11):1188-1198. doi: 10.1038/s41589-021-00876-6. Epub 2021 Oct 11.

本文引用的文献

4
The importance of phase information for human genomics.相位信息对于人类基因组学的重要性。
Nat Rev Genet. 2011 Mar;12(3):215-23. doi: 10.1038/nrg2950. Epub 2011 Feb 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验