在线性时间内找到所有最大完美单倍型块。

Finding all maximal perfect haplotype blocks in linear time.

作者信息

Alanko Jarno, Bannai Hideo, Cazaux Bastien, Peterlongo Pierre, Stoye Jens

机构信息

1Department of Computer Science, University of Helsinki, Helsinki, Finland.

2Department of Informatics, Kyushu University, Fukuoka, Japan.

出版信息

Algorithms Mol Biol. 2020 Feb 10;15:2. doi: 10.1186/s13015-020-0163-6. eCollection 2020.

DOI:10.1186/s13015-020-0163-6

PMID:32055252

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7008532/

Abstract

Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Advances in bioinformatics and computational biology: 11th Brazilian symposium on bioinformatics, BSB 2018, Niterói, Brazil, October 30 - November 1, 2018, Proceedings, 2018. 10.1007/978-3-030-01722-4_3) suggested the as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

摘要

近期大规模的群体测序工作使得以前所未有的详细程度识别出显示自然选择特征的基因组区域成为可能。然而，从个体单倍型数据中识别此类区域的传统方法需要耗费大量计算时间，因此不适用于当前的数据集。2019年，库尼亚等人（《生物信息学与计算生物学进展：第11届巴西生物信息学研讨会，2018年巴西尼特罗伊，2018年10月30日至11月1日，会议录，2018. 10.1007/978 - 3 - 030 - 01722 - 4_3》）提出了一种非常简单的组合模式，作为一种执行全基因组快速选择扫描新方法的基础。然而，他们提出的用于识别这些区域的算法，其最坏情况运行时间在基因组长度上是二次方的。是否存在最优的线性时间算法还是一个未解决的问题。在本文中，我们给出了两种达到该时间界限的算法，一种在概念上非常简单，使用后缀树，另一种使用位置布罗算法，在实践中也非常高效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1721/7008532/2030bd5bbdb2/13015_2020_163_Fig1_HTML.jpg

相似文献

Finding all maximal perfect haplotype blocks in linear time.在线性时间内找到所有最大完美单倍型块。

Algorithms Mol Biol. 2020 Feb 10;15:2. doi: 10.1186/s13015-020-0163-6. eCollection 2020.

Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).利用位置 Burrows-Wheeler 变换 (PBWT) 实现高效单倍型匹配和存储。

Bioinformatics. 2014 May 1;30(9):1266-72. doi: 10.1093/bioinformatics/btu014. Epub 2014 Jan 9.

Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.利用压缩后缀树和布罗伊登-弗莱彻-戈德法布-香农（BFGS）变换进行图形化泛基因组分析。（注：原文中未提及“布罗伊登-弗莱彻-戈德法布-香农（BFGS）变换”，这里按照常规理解将Burrows-Wheeler transform翻译为布罗伊登-弗莱彻-戈德法布-香农变换，你可根据实际情况进行调整，因为它可能是特定领域有固定译法的专业术语，也有可能是输入有误，如果实际是Burrows-Wheeler transform应翻译为“伯罗斯-惠勒变换” ）修正后的译文：利用压缩后缀树和伯罗斯-惠勒变换进行图形化泛基因组分析。

Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26.

Efficient haplotype block recognition of very long and dense genetic sequences.高效识别非常长且密集的遗传序列的单倍型块。

BMC Bioinformatics. 2014 Jan 14;15:10. doi: 10.1186/1471-2105-15-10.

HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets.HaploBlocks：在大型群体基因组数据集高效检测正选择。

Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad027.

Efficient maximal repeat finding using the burrows-wheeler transform and wavelet tree.利用布罗沃德-惠勒变换和小波树进行高效最大重复查找。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):421-9. doi: 10.1109/TCBB.2011.127. Epub 2011 Sep 27.

Maximal Perfect Haplotype Blocks with Wildcards.带有通配符的最大完美单倍型块

iScience. 2020 Jun 26;23(6):101149. doi: 10.1016/j.isci.2020.101149. Epub 2020 May 11.

d-PBWT: dynamic positional Burrows-Wheeler transform.d-PBWT：动态位置布罗算法变换

Bioinformatics. 2021 Aug 25;37(16):2390-2397. doi: 10.1093/bioinformatics/btab117.

Exploiting parallelization in positional Burrows-Wheeler transform (PBWT) algorithms for efficient haplotype matching and compression.利用位置布隆-惠勒变换（PBWT）算法中的并行化实现高效单倍型匹配与压缩。

Bioinform Adv. 2023 Mar 2;3(1):vbad021. doi: 10.1093/bioadv/vbad021. eCollection 2023.

An improved encoding of genetic variation in a Burrows-Wheeler transform.一种改进的 Burrows-Wheeler 变换中的遗传变异编码。

Bioinformatics. 2020 Mar 1;36(5):1413-1419. doi: 10.1093/bioinformatics/btz782.

引用本文的文献

Haplotype Matching with GBWT for Pangenome Graphs.用于泛基因组图的基于广义布隆游走树的单倍型匹配

bioRxiv. 2025 Feb 7:2025.02.03.634410. doi: 10.1101/2025.02.03.634410.

PangeBlocks: customized construction of pangenome graphs via maximal blocks.PangeBlocks：通过最大块实现泛基因组图的定制构建。

BMC Bioinformatics. 2024 Nov 4;25(1):344. doi: 10.1186/s12859-024-05958-5.

Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank.英国生物库中纯合性 runs-of-homozygosity 二倍体型簇的发现及其与疾病的关联。

Elife. 2024 Jun 21;13:e81698. doi: 10.7554/eLife.81698.

μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data.μ-PBWT：用于存储和查询 UK Biobank 数据的轻量级 PBWT r-索引。

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad552.

Fast inference of genetic recombination rates in biobank scale data.大规模生物库数据中遗传重组率的快速推断。

Genome Res. 2023 Jul;33(7):1015-1022. doi: 10.1101/gr.277676.123. Epub 2023 Jun 22.

HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets.HaploBlocks：在大型群体基因组数据集高效检测正选择。

Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad027.

FastRecomb: Fast inference of genetic recombination rates in biobank scale data.FastRecomb：生物样本库规模数据中基因重组率的快速推断

bioRxiv. 2023 Jan 10:2023.01.09.523304. doi: 10.1101/2023.01.09.523304.

d-PBWT: dynamic positional Burrows-Wheeler transform.d-PBWT：动态位置布罗算法变换

Bioinformatics. 2021 Aug 25;37(16):2390-2397. doi: 10.1093/bioinformatics/btab117.

Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank.英国生物银行中纯合子双倍型簇的发现及其与疾病的关联。

medRxiv. 2020 Oct 27:2020.10.26.20220004. doi: 10.1101/2020.10.26.20220004.

Maximal Perfect Haplotype Blocks with Wildcards.带有通配符的最大完美单倍型块

iScience. 2020 Jun 26;23(6):101149. doi: 10.1016/j.isci.2020.101149. Epub 2020 May 11.

本文引用的文献

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.NHGRI-EBI GWAS Catalog 于 2019 年发布的已发表全基因组关联研究、靶向基因芯片和汇总统计数据

Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012. doi: 10.1093/nar/gky1120.

Haplotype matching in large cohorts using the Li and Stephens model.利用李和斯蒂芬斯模型在大样本中进行单体型匹配。

Bioinformatics. 2019 Mar 1;35(5):798-806. doi: 10.1093/bioinformatics/bty735.

The 100 000 Genomes Project: bringing whole genome sequencing to the NHS.“十万基因组计划”：将全基因组测序引入英国国家医疗服务体系。

BMJ. 2018 Apr 24;361:k1687. doi: 10.1136/bmj.k1687.

A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.高质量的人类参考面板揭示了基因组结构变异的复杂性和分布。

Nat Commun. 2016 Oct 6;7:12989. doi: 10.1038/ncomms12989.

An integrated map of structural variation in 2,504 human genomes.2504个人类基因组结构变异的整合图谱。

Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Large-scale whole-genome sequencing of the Icelandic population.大规模全基因组测序的冰岛人口。

Nat Genet. 2015 May;47(5):435-44. doi: 10.1038/ng.3247. Epub 2015 Mar 25.

A hidden Markov model for investigating recent positive selection through haplotype structure.一种通过单倍型结构研究近期正选择的隐马尔可夫模型。

Theor Popul Biol. 2015 Feb;99:18-30. doi: 10.1016/j.tpb.2014.11.001. Epub 2014 Nov 13.

Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).利用位置 Burrows-Wheeler 变换 (PBWT) 实现高效单倍型匹配和存储。

Bioinformatics. 2014 May 1;30(9):1266-72. doi: 10.1093/bioinformatics/btu014. Epub 2014 Jan 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在线性时间内找到所有最大完美单倍型块。

Finding all maximal perfect haplotype blocks in linear time.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献