基于个体血缘关系和基因转换的生物银行规模个体推断。

Biobank-scale inference of multi-individual identity by descent and gene conversion.

机构信息

Department of Biostatistics, University of Washington, Seattle, WA, USA.

Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.

出版信息

Am J Hum Genet. 2024 Apr 4;111(4):691-700. doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

DOI:10.1016/j.ajhg.2024.02.015

PMID:38513668

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11023918/

Abstract

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.

摘要

我们提出了一种在大型生物库规模的序列数据中高效识别同一位点基因型簇的方法。我们的多个体方法比推断个体间 IBD 片段的方法更有效地推断同源重组（IBD），并提供了具有特定基因座的 IBD 簇，而不是 IBD 片段。我们的方法的计算时间、内存需求和输出大小与数据集的个体数量呈线性关系。我们还提出了一种使用多个体 IBD 来检测基因转换改变的等位基因的方法。将我们的方法应用于英国生物库中 125361 名白种英国人的常染色体序列数据中，检测到超过 900 万个发生基因转换的等位基因。这是以前对家族数据进行分析时检测到的基因转换改变的等位基因数量的 2900 倍。我们估计，使用基于家族的方法找到类似数量的基因转换改变的等位基因，需要测序的先证者超过 25 万例，以及来自多代家族成员的更多数量的基因组。我们的 IBD 聚类方法在开源的 ibd-cluster 软件包中实现。

相似文献

Biobank-scale inference of multi-individual identity by descent and gene conversion.基于个体血缘关系和基因转换的生物银行规模个体推断。

Am J Hum Genet. 2024 Apr 4;111(4):691-700. doi: 10.1016/j.ajhg.2024.02.015. Epub 2024 Mar 20.

Biobank-scale inference of multi-individual identity by descent and gene conversion.基于血缘关系和基因转换的生物样本库规模多个体身份推断。

bioRxiv. 2023 Nov 5:2023.11.03.565574. doi: 10.1101/2023.11.03.565574.

A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.一种在大规模数据中快速简单检测同源片段的方法。

Am J Hum Genet. 2020 Apr 2;106(4):426-437. doi: 10.1016/j.ajhg.2020.02.010. Epub 2020 Mar 12.

Probabilistic Estimation of Identity by Descent Segment Endpoints and Detection of Recent Selection.通过血统片段末端的概率估计和近期选择的检测来估计身份。

Am J Hum Genet. 2020 Nov 5;107(5):895-910. doi: 10.1016/j.ajhg.2020.09.010. Epub 2020 Oct 13.

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.快速、无相位的长同源片段检测可实现有效的关系分类。

Am J Hum Genet. 2020 Apr 2;106(4):453-466. doi: 10.1016/j.ajhg.2020.02.012. Epub 2020 Mar 19.

RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts.RaPID：在生物库规模队列中快速、强大且准确地检测由同源片段（IBD）

Genome Biol. 2019 Jul 25;20(1):143. doi: 10.1186/s13059-019-1754-8.

Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments.从生物库规模的 IBD 片段推断的英国个体的个性化家系史。

BMC Biol. 2021 Feb 16;19(1):32. doi: 10.1186/s12915-021-00964-y.

FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts.FiMAP：一种用于生物库规模队列的快速基于关系的映射测试。

PLoS Genet. 2023 Dec 1;19(12):e1011057. doi: 10.1371/journal.pgen.1011057. eCollection 2023 Dec.

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.使用 RaPID 在大规模生物库研究中进行准确和快速的家族关系推断。

PLoS Genet. 2021 Jan 21;17(1):e1009315. doi: 10.1371/journal.pgen.1009315. eCollection 2021 Jan.

Efficient clustering of identity-by-descent between multiple individuals.多个个体之间的血缘关系的高效聚类。

Bioinformatics. 2014 Apr 1;30(7):915-22. doi: 10.1093/bioinformatics/btt734. Epub 2013 Dec 19.

引用本文的文献

Estimating gene conversion rates from population data using multi-individual identity by descent.利用多位个体的同源性从群体数据估计基因转换率。

Am J Hum Genet. 2025 Aug 16. doi: 10.1016/j.ajhg.2025.07.019.

Identity-By-Descent Mapping Using Multi-Individual IBD With Genome-Wide Multiple Testing Adjustment.使用多个体同源性检测并进行全基因组多重检验校正的同源性映射

Genet Epidemiol. 2025 Sep;49(6):e70015. doi: 10.1002/gepi.70015.

Power and Limitations of Inferring Genetic Ancestry.推断遗传血统的能力与局限性

Ann Hum Genet. 2025 Sep;89(5):264-273. doi: 10.1111/ahg.70007. Epub 2025 Jul 15.

SPC: a SPectral Component approach to address recent population structure in genomic analysis.SPC：一种用于解决基因组分析中近期群体结构问题的光谱成分方法。

medRxiv. 2025 Jun 5:2025.06.04.25328990. doi: 10.1101/2025.06.04.25328990.

It's a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline.大功告成：在全基因组关联研究流程之后通过错误发现率控制得出不同的发现。

bioRxiv. 2025 Jul 9:2025.06.05.658138. doi: 10.1101/2025.06.05.658138.

Fast simulation of identity-by-descent segments.同源片段的快速模拟。

Bull Math Biol. 2025 May 23;87(7):84. doi: 10.1007/s11538-025-01464-8.

Estimating gene conversion rates from population data using multi-individual identity by descent.利用多位个体的血缘同一性从群体数据中估计基因转换率。

bioRxiv. 2025 Feb 27:2025.02.22.639693. doi: 10.1101/2025.02.22.639693.

Mean gene conversion tract length in humans estimated to be 459 bp from UK Biobank sequence data.根据英国生物银行序列数据估计，人类基因转换片段的平均长度为459碱基对。

bioRxiv. 2025 Jan 16:2024.12.30.630818. doi: 10.1101/2024.12.30.630818.

Complete human recombination maps.完整的人类重组图谱。

Nature. 2025 Mar;639(8055):700-707. doi: 10.1038/s41586-024-08450-5. Epub 2025 Jan 22.

Fast simulation of identity-by-descent segments.同源片段的快速模拟。

bioRxiv. 2025 Jan 7:2024.12.13.628449. doi: 10.1101/2024.12.13.628449.

本文引用的文献

Identity-by-descent-based estimation of the X chromosome effective population size with application to sex-specific demographic history.基于亲缘关系的 X 染色体有效种群大小估计及其在性别特异性人口历史中的应用。

G3 (Bethesda). 2023 Sep 30;13(10). doi: 10.1093/g3journal/jkad165.

Fast inference of genetic recombination rates in biobank scale data.大规模生物库数据中遗传重组率的快速推断。

Genome Res. 2023 Jul;33(7):1015-1022. doi: 10.1101/gr.277676.123. Epub 2023 Jun 22.

Selecting Clustering Algorithms for Identity-By-Descent Mapping.选择用于同源定位映射的聚类算法。

Pac Symp Biocomput. 2023;28:121-132.

Statistical phasing of 150,119 sequenced genomes in the UK Biobank.英国生物库中 150119 个测序基因组的统计相位。

Am J Hum Genet. 2023 Jan 5;110(1):161-165. doi: 10.1016/j.ajhg.2022.11.008. Epub 2022 Nov 29.

Estimating the genome-wide mutation rate from thousands of unrelated individuals.从数千个无关个体估计全基因组突变率。

Am J Hum Genet. 2022 Dec 1;109(12):2178-2184. doi: 10.1016/j.ajhg.2022.10.015. Epub 2022 Nov 11.

The sequences of 150,119 genomes in the UK Biobank.英国生物库中 150119 个基因组的序列。

Nature. 2022 Jul;607(7920):732-740. doi: 10.1038/s41586-022-04965-x. Epub 2022 Jul 20.

Efficient ancestry and mutation simulation with msprime 1.0.利用 msprime 1.0 进行高效的祖先和突变模拟。

Genetics. 2022 Mar 3;220(3). doi: 10.1093/genetics/iyab229.

Current Developments in Detection of Identity-by-Descent Methods and Applications.同源性检测方法的当前发展与应用

Front Genet. 2021 Sep 10;12:722602. doi: 10.3389/fgene.2021.722602. eCollection 2021.

Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。

Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.

Rapid detection of identity-by-descent tracts for mega-scale datasets.大规模数据集的同源片段快速检测

Nat Commun. 2021 Jun 10;12(1):3546. doi: 10.1038/s41467-021-22910-w.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验