Suppr超能文献

一种用于单体型组装问题的高度精确启发式算法。

A highly accurate heuristic algorithm for the haplotype assembly problem.

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

BMC Genomics. 2013;14 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-14-S2-S2. Epub 2013 Feb 15.

Abstract

BACKGROUND

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in human DNA. The sequence of SNPs in each of the two copies of a given chromosome in a diploid organism is referred to as a haplotype. Haplotype information has many applications such as gene disease diagnoses, drug design, etc. The haplotype assembly problem is defined as follows: Given a set of fragments sequenced from the two copies of a chromosome of a single individual, and their locations in the chromosome, which can be pre-determined by aligning the fragments to a reference DNA sequence, the goal here is to reconstruct two haplotypes (h1, h2) from the input fragments. Existing algorithms do not work well when the error rate of fragments is high. Here we design an algorithm that can give accurate solutions, even if the error rate of fragments is high.

RESULTS

We first give a dynamic programming algorithm that can give exact solutions to the haplotype assembly problem. The time complexity of the algorithm is O(n × 2t × t), where n is the number of SNPs, and t is the maximum coverage of a SNP site. The algorithm is slow when t is large. To solve the problem when t is large, we further propose a heuristic algorithm on the basis of the dynamic programming algorithm. Experiments show that our heuristic algorithm can give very accurate solutions.

CONCLUSIONS

We have tested our algorithm on a set of benchmark datasets. Experiments show that our algorithm can give very accurate solutions. It outperforms most of the existing programs when the error rate of the input fragments is high.

摘要

背景

单核苷酸多态性(SNPs)是人类 DNA 中最常见的遗传变异形式。在二倍体生物中,给定染色体的两个副本中 SNP 的序列被称为单倍型。单倍型信息有许多应用,如基因疾病诊断、药物设计等。单倍型组装问题定义如下:给定从单个个体的两条染色体的片段测序,并给定它们在染色体中的位置,这些位置可以通过将片段与参考 DNA 序列对齐来预先确定,目标是从输入片段重建两个单倍型(h1、h2)。现有的算法在片段错误率高时效果不佳。在这里,我们设计了一种即使在片段错误率高的情况下也能给出准确解决方案的算法。

结果

我们首先给出了一种可以为单倍型组装问题提供精确解的动态规划算法。该算法的时间复杂度为 O(n×2t×t),其中 n 是 SNP 的数量,t 是 SNP 位点的最大覆盖度。当 t 较大时,算法较慢。为了解决 t 较大时的问题,我们在动态规划算法的基础上进一步提出了一种启发式算法。实验表明,我们的启发式算法可以给出非常准确的解决方案。

结论

我们在一组基准数据集上测试了我们的算法。实验表明,我们的算法可以给出非常准确的解决方案。当输入片段的错误率较高时,它优于大多数现有程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/b92bf6acfe4c/1471-2164-14-S2-S2-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验