基于模板化位置 Burrows-Wheeler 变换的快速稳健的同宗推断。

Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.

机构信息

23andMe, Inc, Sunnyvale, CA, USA.

出版信息

Mol Biol Evol. 2021 May 4;38(5):2131-2151. doi: 10.1093/molbev/msaa328.

DOI:10.1093/molbev/msaa328

PMID:33355662

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8097300/

Abstract

Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).

摘要

估算个体间同源（IBD）片段的基因组位置和长度是许多遗传分析的关键步骤。然而，生物库和直接面向消费者的遗传数据集的规模呈指数级增长，使得准确的 IBD 推断成为一项重大的计算挑战。在这里，我们提出了模板化位置 Burrows-Wheeler 变换（TPBWT），以使快速 IBD 估计对基因型和相位误差具有鲁棒性。使用在具有真实基因分型和相位误差的家系上模拟的单倍型数据，我们表明 TPBWT 在速度和准确性方面优于其他最先进的 IBD 推断算法。对于每个相位感知方法，我们探讨了通过片段长度推断 IBD 的假阳性和假阴性率，并描述了常见的错误类型。我们的结果突出了大多数相位 IBD 推断方法的脆弱性；IBD 估计的准确性对单倍型相位的质量高度敏感。此外，我们比较了 TPBWT 与一种广泛使用的、对相位误差具有鲁棒性的无相位 IBD 推断方法的性能。我们引入了基于 TPBWT 的内样本和外样本 IBD 推断算法，并在具有数百万个样本的大规模数据集上演示了它们的计算效率。此外，我们描述了 TPBWT 压缩单倍型的二进制文件格式，这导致了针对非常大规模的队列面板的快速和高效的外样本 IBD 计算。最后，我们在一个简短的实证分析中展示了 TPBWT 的效用，探索了墨西哥内部单倍型共享的地理模式。在墨西哥内部区域之间共享的 IBD 的层次聚类揭示了地理结构的单倍型共享和距离隔离的强烈信号。我们的 TPBWT 软件实现可在代码库中免费非商业使用（https://github.com/23andMe/phasedibd，最后访问时间为 2021 年 1 月 11 日）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e43/8097300/d245756d1ce9/msaa328f1.jpg

相似文献

Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.基于模板化位置 Burrows-Wheeler 变换的快速稳健的同宗推断。

Mol Biol Evol. 2021 May 4;38(5):2131-2151. doi: 10.1093/molbev/msaa328.

A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.一种在大规模数据中快速简单检测同源片段的方法。

Am J Hum Genet. 2020 Apr 2;106(4):426-437. doi: 10.1016/j.ajhg.2020.02.010. Epub 2020 Mar 12.

Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis.减少通过大规模系谱分析检测到的普遍存在的假阳性同源片段。

Mol Biol Evol. 2014 Aug;31(8):2212-22. doi: 10.1093/molbev/msu151. Epub 2014 Apr 30.

RaPID-Query for fast identity by descent search and genealogical analysis.RaPID-Query 用于快速的血缘关系搜索和系谱分析。

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad312.

Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE.通过 TRUFFLE 在非相位遗传数据中快速准确地检测共享片段和估计亲缘关系。

Am J Hum Genet. 2019 Jul 3;105(1):78-88. doi: 10.1016/j.ajhg.2019.05.007. Epub 2019 Jun 6.

P-smoother: efficient PBWT smoothing of large haplotype panels.P-平滑器：对大型单倍型面板进行高效的基于位置的小波变换平滑处理

Bioinform Adv. 2022 Jun 20;2(1):vbac045. doi: 10.1093/bioadv/vbac045. eCollection 2022.

A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data.一种在全基因组SNP数据中检测IBD共享单倍型的快速准确方法。

Eur J Hum Genet. 2017 May;25(5):617-624. doi: 10.1038/ejhg.2017.6. Epub 2017 Feb 8.

Reference-based phasing using the Haplotype Reference Consortium panel.使用单倍型参考联盟面板进行基于参考的定相

Nat Genet. 2016 Nov;48(11):1443-1448. doi: 10.1038/ng.3679. Epub 2016 Oct 3.

Improving the accuracy and efficiency of identity-by-descent detection in population data.提高群体数据中基于关联的身份检测的准确性和效率。

Genetics. 2013 Jun;194(2):459-71. doi: 10.1534/genetics.113.150029. Epub 2013 Mar 27.

Identity-by-descent-based phasing and imputation in founder populations using graphical models.基于图形模型的奠基人群中基于血缘关系的相位确定和推断。

Genet Epidemiol. 2011 Dec;35(8):853-60. doi: 10.1002/gepi.20635. Epub 2011 Oct 17.

引用本文的文献

Benchmarking and optimization of methods for the detection of identity-by-descent in high-recombining genomes.高重组基因组中同源基因检测方法的基准测试与优化

Elife. 2025 Aug 19;14:RP101924. doi: 10.7554/eLife.101924.

No evidence for disassortative mating based on HLA genotype in a natural fertility population.在自然生育人群中，没有证据表明基于HLA基因型存在选型交配。

bioRxiv. 2025 May 8:2025.05.06.652536. doi: 10.1101/2025.05.06.652536.

Genetic disease risks of under-represented founder populations in New York City.纽约市代表性不足的奠基人群体的遗传疾病风险。

PLoS Genet. 2025 Jun 24;21(6):e1011755. doi: 10.1371/journal.pgen.1011755. eCollection 2025 Jun.

SPC: a SPectral Component approach to address recent population structure in genomic analysis.SPC：一种用于解决基因组分析中近期群体结构问题的光谱成分方法。

medRxiv. 2025 Jun 5:2025.06.04.25328990. doi: 10.1101/2025.06.04.25328990.

Fast simulation of identity-by-descent segments.同源片段的快速模拟。

Bull Math Biol. 2025 May 23;87(7):84. doi: 10.1007/s11538-025-01464-8.

A rapid accurate approach to inferring pedigrees in endogamous populations.一种在内婚制群体中推断谱系的快速准确方法。

Genetics. 2025 Aug 6;230(4). doi: 10.1093/genetics/iyaf094.

Potential and pitfalls of using identity-by-descent for malaria genomic surveillance.利用同源性进行疟疾基因组监测的潜力与陷阱

Trends Parasitol. 2025 May;41(5):387-400. doi: 10.1016/j.pt.2025.03.012. Epub 2025 Apr 21.

Haplotype-based Parallel PBWT for Biobank Scale Data.基于单倍型的并行排列Burrows-Wheeler变换用于生物样本库规模的数据

bioRxiv. 2025 Feb 8:2025.02.04.636317. doi: 10.1101/2025.02.04.636317.

Haplotype Matching with GBWT for Pangenome Graphs.用于泛基因组图的基于广义布隆游走树的单倍型匹配

bioRxiv. 2025 Feb 7:2025.02.03.634410. doi: 10.1101/2025.02.03.634410.

Fast simulation of identity-by-descent segments.同源片段的快速模拟。

bioRxiv. 2025 Jan 7:2024.12.13.628449. doi: 10.1101/2024.12.13.628449.

本文引用的文献

Rapid detection of identity-by-descent tracts for mega-scale datasets.大规模数据集的同源片段快速检测

Nat Commun. 2021 Jun 10;12(1):3546. doi: 10.1038/s41467-021-22910-w.

Ancestral haplotype reconstruction in endogamous populations using identity-by-descent.利用同源单亲二倍体进行同宗人群的祖先单体型重建。

PLoS Comput Biol. 2021 Feb 26;17(2):e1008638. doi: 10.1371/journal.pcbi.1008638. eCollection 2021 Feb.

Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments.从生物库规模的 IBD 片段推断的英国个体的个性化家系史。

BMC Biol. 2021 Feb 16;19(1):32. doi: 10.1186/s12915-021-00964-y.

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.快速、无相位的长同源片段检测可实现有效的关系分类。

Am J Hum Genet. 2020 Apr 2;106(4):453-466. doi: 10.1016/j.ajhg.2020.02.012. Epub 2020 Mar 19.

A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.一种在大规模数据中快速简单检测同源片段的方法。

Am J Hum Genet. 2020 Apr 2;106(4):426-437. doi: 10.1016/j.ajhg.2020.02.010. Epub 2020 Mar 12.

Accurate, scalable and integrative haplotype estimation.精确、可扩展且综合的单倍型估计。

Nat Commun. 2019 Nov 28;10(1):5436. doi: 10.1038/s41467-019-13225-y.

Efficient haplotype matching between a query and a panel for genealogical search.针对系谱搜索，查询与面板之间的高效单倍型匹配。

Bioinformatics. 2019 Jul 15;35(14):i233-i241. doi: 10.1093/bioinformatics/btz347.

RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts.RaPID：在生物库规模队列中快速、强大且准确地检测由同源片段（IBD）

Genome Biol. 2019 Jul 25;20(1):143. doi: 10.1186/s13059-019-1754-8.

Identity-by-Descent Analysis Reveals Susceptibility Loci for Severe Acne in Chinese Han Cohort.基于血缘同一性分析揭示中国汉族人群重度痤疮的易感基因座。

J Invest Dermatol. 2019 Sep;139(9):2049-2051.e20. doi: 10.1016/j.jid.2019.03.1132. Epub 2019 Mar 25.

The Genetic Ancestry of Modern Indus Valley Populations from Northwest India.印度西北部现代印度河流域人群的遗传起源。

Am J Hum Genet. 2018 Dec 6;103(6):918-929. doi: 10.1016/j.ajhg.2018.10.022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于模板化位置 Burrows-Wheeler 变换的快速稳健的同宗推断。

Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献