• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于单体型组装问题的高度精确启发式算法。

A highly accurate heuristic algorithm for the haplotype assembly problem.

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

BMC Genomics. 2013;14 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-14-S2-S2. Epub 2013 Feb 15.

DOI:10.1186/1471-2164-14-S2-S2
PMID:23445458
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3582451/
Abstract

BACKGROUND

Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in human DNA. The sequence of SNPs in each of the two copies of a given chromosome in a diploid organism is referred to as a haplotype. Haplotype information has many applications such as gene disease diagnoses, drug design, etc. The haplotype assembly problem is defined as follows: Given a set of fragments sequenced from the two copies of a chromosome of a single individual, and their locations in the chromosome, which can be pre-determined by aligning the fragments to a reference DNA sequence, the goal here is to reconstruct two haplotypes (h1, h2) from the input fragments. Existing algorithms do not work well when the error rate of fragments is high. Here we design an algorithm that can give accurate solutions, even if the error rate of fragments is high.

RESULTS

We first give a dynamic programming algorithm that can give exact solutions to the haplotype assembly problem. The time complexity of the algorithm is O(n × 2t × t), where n is the number of SNPs, and t is the maximum coverage of a SNP site. The algorithm is slow when t is large. To solve the problem when t is large, we further propose a heuristic algorithm on the basis of the dynamic programming algorithm. Experiments show that our heuristic algorithm can give very accurate solutions.

CONCLUSIONS

We have tested our algorithm on a set of benchmark datasets. Experiments show that our algorithm can give very accurate solutions. It outperforms most of the existing programs when the error rate of the input fragments is high.

摘要

背景

单核苷酸多态性(SNPs)是人类 DNA 中最常见的遗传变异形式。在二倍体生物中,给定染色体的两个副本中 SNP 的序列被称为单倍型。单倍型信息有许多应用,如基因疾病诊断、药物设计等。单倍型组装问题定义如下:给定从单个个体的两条染色体的片段测序,并给定它们在染色体中的位置,这些位置可以通过将片段与参考 DNA 序列对齐来预先确定,目标是从输入片段重建两个单倍型(h1、h2)。现有的算法在片段错误率高时效果不佳。在这里,我们设计了一种即使在片段错误率高的情况下也能给出准确解决方案的算法。

结果

我们首先给出了一种可以为单倍型组装问题提供精确解的动态规划算法。该算法的时间复杂度为 O(n×2t×t),其中 n 是 SNP 的数量,t 是 SNP 位点的最大覆盖度。当 t 较大时,算法较慢。为了解决 t 较大时的问题,我们在动态规划算法的基础上进一步提出了一种启发式算法。实验表明,我们的启发式算法可以给出非常准确的解决方案。

结论

我们在一组基准数据集上测试了我们的算法。实验表明,我们的算法可以给出非常准确的解决方案。当输入片段的错误率较高时,它优于大多数现有程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/48e5406b6a1a/1471-2164-14-S2-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/b92bf6acfe4c/1471-2164-14-S2-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/4657fe71a757/1471-2164-14-S2-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/48e5406b6a1a/1471-2164-14-S2-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/b92bf6acfe4c/1471-2164-14-S2-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/4657fe71a757/1471-2164-14-S2-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e874/3582451/48e5406b6a1a/1471-2164-14-S2-S2-3.jpg

相似文献

1
A highly accurate heuristic algorithm for the haplotype assembly problem.一种用于单体型组装问题的高度精确启发式算法。
BMC Genomics. 2013;14 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2164-14-S2-S2. Epub 2013 Feb 15.
2
Better ILP models for haplotype assembly.更好的用于单体型组装的 ILP 模型。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):52. doi: 10.1186/s12859-018-2012-x.
3
Haplotype assembly from aligned weighted SNP fragments.基于比对加权单核苷酸多态性片段的单倍型组装
Comput Biol Chem. 2005 Aug;29(4):281-7. doi: 10.1016/j.compbiolchem.2005.05.001.
4
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
5
Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型
BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.
6
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.HapCUT:一种用于单倍型组装问题的高效且准确的算法。
Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.
7
SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage.SpeedHap:一种针对存在许多缺口、高读取错误率和低覆盖率的单一个体单核苷酸多态性单倍型分型问题的精确启发式算法。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):492-502. doi: 10.1109/TCBB.2008.67.
8
Exact algorithms for haplotype assembly from whole-genome sequence data.全基因组序列数据中单体型组装的精确算法。
Bioinformatics. 2013 Aug 15;29(16):1938-45. doi: 10.1093/bioinformatics/btt349. Epub 2013 Jun 18.
9
Optimal algorithms for haplotype assembly from whole-genome sequence data.从全基因组序列数据中进行单倍型组装的最优算法。
Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.
10
On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.关于二倍体和多倍体基因组中单倍型组装的最小错误校正问题
J Comput Biol. 2016 Sep;23(9):718-36. doi: 10.1089/cmb.2015.0220. Epub 2016 Jun 9.

引用本文的文献

1
Minimum error correction-based haplotype assembly: Considerations for long read data.基于最小错误校正的单倍型组装:长读数据的考虑因素。
PLoS One. 2020 Jun 12;15(6):e0234470. doi: 10.1371/journal.pone.0234470. eCollection 2020.
2
NGS based haplotype assembly using matrix completion.基于矩阵补全的 NGS 单倍型组装。
PLoS One. 2019 Mar 26;14(3):e0214455. doi: 10.1371/journal.pone.0214455. eCollection 2019.
3
Efficient algorithms for polyploid haplotype phasing.高效的多倍体单体型相位算法。

本文引用的文献

1
A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem.几种算法在单个体 SNP 单体型重构问题上的比较。
Bioinformatics. 2010 Sep 15;26(18):2217-25. doi: 10.1093/bioinformatics/btq411. Epub 2010 Jul 11.
2
Optimal algorithms for haplotype assembly from whole-genome sequence data.从全基因组序列数据中进行单倍型组装的最优算法。
Bioinformatics. 2010 Jun 15;26(12):i183-90. doi: 10.1093/bioinformatics/btq215.
3
SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage.
BMC Genomics. 2018 May 9;19(Suppl 2):110. doi: 10.1186/s12864-018-4464-9.
4
Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.二倍体和多倍体单体型组装的稀疏张量分解。
BMC Genomics. 2018 Mar 21;19(Suppl 4):191. doi: 10.1186/s12864-018-4551-y.
5
Dense and accurate whole-chromosome haplotyping of individual genomes.个体基因组的密集且精确的全染色体单倍型分型。
Nat Commun. 2017 Nov 3;8(1):1293. doi: 10.1038/s41467-017-01389-4.
6
PWHATSHAP: efficient haplotyping for future generation sequencing.PWHATSHAP:用于下一代测序的高效单倍型分型
BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):342. doi: 10.1186/s12859-016-1170-y.
7
Rising Strengths Hong Kong SAR in Bioinformatics.香港特区生物信息学实力不断增强。
Interdiscip Sci. 2017 Jun;9(2):224-236. doi: 10.1007/s12539-016-0147-x. Epub 2016 Mar 9.
8
Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.通过序贯蒙特卡罗算法进行联合单倍型组装和基因型分型
BMC Bioinformatics. 2015 Jul 16;16:223. doi: 10.1186/s12859-015-0651-8.
9
Whole-genome haplotyping approaches and genomic medicine.全基因组单倍型分析方法与基因组医学
Genome Med. 2014 Sep 25;6(9):73. doi: 10.1186/s13073-014-0073-7. eCollection 2014.
SpeedHap:一种针对存在许多缺口、高读取错误率和低覆盖率的单一个体单核苷酸多态性单倍型分型问题的精确启发式算法。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):492-502. doi: 10.1109/TCBB.2008.67.
4
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem.HapCUT:一种用于单倍型组装问题的高效且准确的算法。
Bioinformatics. 2008 Aug 15;24(16):i153-9. doi: 10.1093/bioinformatics/btn298.
5
An MCMC algorithm for haplotype assembly from whole-genome sequence data.一种用于从全基因组序列数据中进行单倍型组装的马尔可夫链蒙特卡罗算法。
Genome Res. 2008 Aug;18(8):1336-46. doi: 10.1101/gr.077065.108.
6
A model of higher accuracy for the individual haplotyping problem based on weighted SNP fragments and genotype with errors.一种基于加权单核苷酸多态性片段和含误差基因型的个体单倍型分型问题的更高精度模型。
Bioinformatics. 2008 Jul 1;24(13):i105-13. doi: 10.1093/bioinformatics/btn147.
7
Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments.用于从单核苷酸多态性(SNP)片段进行奇异单倍型重建问题的线性时间概率算法。
J Comput Biol. 2008 Jun;15(5):535-46. doi: 10.1089/cmb.2008.0003.
8
The diploid genome sequence of an individual human.某个人类个体的二倍体基因组序列。
PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.
9
A clustering algorithm based on two distance functions for MEC model.一种基于两种距离函数的用于MEC模型的聚类算法。
Comput Biol Chem. 2007 Apr;31(2):148-50. doi: 10.1016/j.compbiolchem.2007.02.001. Epub 2007 Feb 9.
10
A haplotype map of the human genome.人类基因组单倍型图谱。
Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.