Suppr超能文献

HapCUT:一种用于单倍型组装问题的高效且准确的算法。

HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.

作者信息

Bansal Vikas, Bafna Vineet

机构信息

Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0404, USA.

出版信息

Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.

Abstract

MOTIVATION

The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps.

RESULTS

We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20-25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project.

AVAILABILITY

A program implementing HapCUT is available on request.

摘要

动机

单倍型组装问题的目标是利用来自两条染色体的测序片段混合物,为个体重建两条单倍型(染色体)。对于各种优化标准,该问题已被证明在计算上是难以处理的。针对该问题的受限版本,已提出了多项式时间算法。在本文中,我们考虑最一般情况下的单倍型组装问题,即任意长度且有任意数量缺口的片段。

结果

我们描述了一种基于计算从测序片段导出的特定图中的最大割的新颖组合方法来解决单倍型组装问题。利维等人已对一个人类个体的完整基因组进行了测序,并使用贪心启发式算法来组装该个体的单倍型。我们已应用我们的方法HapCUT从该数据推断单倍型,并证明使用HapCUT推断的单倍型比贪心启发式算法和先前发表的方法Fast Hare显著更准确(所有染色体的最大错误校正分数低20 - 25%)。我们还描述了一种基于最大似然的估计方法,用于使用来自国际人类基因组单体型图计划的群体单倍型来估计基于序列的单倍型的绝对准确性。

可用性

可根据要求提供实现HapCUT的程序。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验