用于单倍体李和斯蒂芬斯模型的平均情况次线性前向算法。

An average-case sublinear forward algorithm for the haploid Li and Stephens model.

作者信息

Rosen Yohei M, Paten Benedict J

机构信息

1UCSC Genomics Institute, 1156 High St, Santa Cruz, CA 95064 USA.

2NYU School of Medicine, 550 First Ave, New York, NY 10016 USA.

出版信息

Algorithms Mol Biol. 2019 Apr 2;14:11. doi: 10.1186/s13015-019-0144-9. eCollection 2019.

DOI:10.1186/s13015-019-0144-9

PMID:30988694

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6446408/

Abstract

BACKGROUND

Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated.

RESULTS

To make the forward algorithm for the haploid Li and Stephens model computationally tractable for these datasets, we have created a numerically exact version of the algorithm with observed average case sublinear runtime with respect to reference panel size when tested against the 1000 Genomes dataset.

CONCLUSIONS

We show a forward algorithm which avoids any tradeoff between runtime and model complexity. Our algorithm makes use of two general strategies which might be applicable to improving the time complexity of other future sequence analysis algorithms: sparse dynamic programming matrices and lazy evaluation.

摘要

背景

单倍型遗传的隐马尔可夫模型，如李和斯蒂芬斯模型，只要模型中使用的代表性参考面板足够小，就可以使用前向算法进行计算上易于处理的概率计算。具体而言，除非使用启发式近似，否则单倍型李和斯蒂芬斯模型及其变体在参考面板大小方面是线性的。然而，目前正在进行的测序项目涉及数千到数十万个体，预计还有数百万个体的测序项目。

结果

为了使单倍型李和斯蒂芬斯模型的前向算法在这些数据集上计算上易于处理，我们创建了该算法的数值精确版本，在针对千人基因组数据集进行测试时，相对于参考面板大小，观察到平均情况的亚线性运行时间。

结论

我们展示了一种前向算法，该算法避免了运行时间和模型复杂性之间的任何权衡。我们的算法利用了两种可能适用于提高未来其他序列分析算法时间复杂度的通用策略：稀疏动态规划矩阵和惰性求值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/614f/6446408/90b021dc2e35/13015_2019_144_Fig1_HTML.jpg

相似文献

An average-case sublinear forward algorithm for the haploid Li and Stephens model.用于单倍体李和斯蒂芬斯模型的平均情况次线性前向算法。

Algorithms Mol Biol. 2019 Apr 2;14:11. doi: 10.1186/s13015-019-0144-9. eCollection 2019.

Haplotype matching in large cohorts using the Li and Stephens model.利用李和斯蒂芬斯模型在大样本中进行单体型匹配。

Bioinformatics. 2019 Mar 1;35(5):798-806. doi: 10.1093/bioinformatics/bty735.

Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model.最小位置子串覆盖：一种替代李和斯蒂芬斯模型的单倍型穿线法

bioRxiv. 2023 Jan 6:2023.01.04.522803. doi: 10.1101/2023.01.04.522803.

kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R.卡利斯：在 R 中用于局部祖源推断的李-斯蒂芬斯模型的现代实现。

BMC Bioinformatics. 2024 Feb 28;25(1):86. doi: 10.1186/s12859-024-05688-8.

Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model.最小位置子串覆盖是替代 Li 和 Stephens 模型的单倍型连接方法。

Genome Res. 2023 Jul;33(7):1007-1014. doi: 10.1101/gr.277673.123. Epub 2023 Jun 14.

zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.zipHMMlib：一个高度优化的 HMM 库，利用输入中的重复项来加速前向算法。

BMC Bioinformatics. 2013 Nov 22;14:339. doi: 10.1186/1471-2105-14-339.

An Event-Driven Approach to Genotype Imputation on a Custom RISC-V Cluster.基于定制 RISC-V 集群的事件驱动型基因分型方法。

IEEE/ACM Trans Comput Biol Bioinform. 2024 Jan-Feb;21(1):26-35. doi: 10.1109/TCBB.2023.3328714. Epub 2024 Feb 5.

Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.利用单倍型信息读长对多个个体进行联合单倍型相位确定和基因型调用。

Bioinformatics. 2013 Oct 1;29(19):2427-34. doi: 10.1093/bioinformatics/btt418. Epub 2013 Aug 13.

Evaluation of vicinity-based hidden Markov models for genotype imputation.基于邻近的隐马尔可夫模型用于基因型推断的评估。

BMC Bioinformatics. 2022 Aug 29;23(1):356. doi: 10.1186/s12859-022-04896-4.

Hap-seqX: expedite algorithm for haplotype phasing with imputation using sequence data.Hap-seqX：使用序列数据进行导入的单倍型相位加速算法。

Gene. 2013 Apr 10;518(1):2-6. doi: 10.1016/j.gene.2012.11.093. Epub 2012 Dec 23.

引用本文的文献

kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R.卡利斯：在 R 中用于局部祖源推断的李-斯蒂芬斯模型的现代实现。

BMC Bioinformatics. 2024 Feb 28;25(1):86. doi: 10.1186/s12859-024-05688-8.

Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model.最小位置子串覆盖是替代 Li 和 Stephens 模型的单倍型连接方法。

Genome Res. 2023 Jul;33(7):1007-1014. doi: 10.1101/gr.277673.123. Epub 2023 Jun 14.

Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model.最小位置子串覆盖：一种替代李和斯蒂芬斯模型的单倍型穿线法

bioRxiv. 2023 Jan 6:2023.01.04.522803. doi: 10.1101/2023.01.04.522803.

A unified genealogy of modern and ancient genomes.现代和古代基因组的统一族谱。

Science. 2022 Feb 25;375(6583):eabi8264. doi: 10.1126/science.abi8264.

本文引用的文献

Haplotype matching in large cohorts using the Li and Stephens model.利用李和斯蒂芬斯模型在大样本中进行单体型匹配。

Bioinformatics. 2019 Mar 1;35(5):798-806. doi: 10.1093/bioinformatics/bty735.

Modelling haplotypes with respect to reference cohort variation graphs.基于参考队列变异图对单倍型进行建模。

Bioinformatics. 2017 Jul 15;33(14):i118-i123. doi: 10.1093/bioinformatics/btx236.

Reference-based phasing using the Haplotype Reference Consortium panel.使用单倍型参考联盟面板进行基于参考的定相

Nat Genet. 2016 Nov;48(11):1443-1448. doi: 10.1038/ng.3679. Epub 2016 Oct 3.

Haplotype estimation for biobank-scale data sets.生物样本库规模数据集的单倍型估计

Nat Genet. 2016 Jul;48(7):817-20. doi: 10.1038/ng.3583. Epub 2016 Jun 6.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).利用位置 Burrows-Wheeler 变换 (PBWT) 实现高效单倍型匹配和存储。

Bioinformatics. 2014 May 1;30(9):1266-72. doi: 10.1093/bioinformatics/btu014. Epub 2014 Jan 9.

Improved whole-chromosome phasing for disease and population genetic studies.用于疾病和群体遗传学研究的改进全染色体定相技术。

Nat Methods. 2013 Jan;10(1):5-6. doi: 10.1038/nmeth.2307.

Phasing of many thousands of genotyped samples.对数千份基因分型样本进行分相。

Am J Hum Genet. 2012 Aug 10;91(2):238-51. doi: 10.1016/j.ajhg.2012.06.013.

Recent explosive human population growth has resulted in an excess of rare genetic variants.最近人类人口的爆炸式增长导致了罕见遗传变异体的过剩。

Science. 2012 May 11;336(6082):740-3. doi: 10.1126/science.1217283.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于单倍体李和斯蒂芬斯模型的平均情况次线性前向算法。

An average-case sublinear forward algorithm for the haploid Li and Stephens model.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献