通过序列数据检测血缘身份并估计基因型错误率。

Detecting identity by descent and estimating genotype error rates in sequence data.

机构信息

Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.

出版信息

Am J Hum Genet. 2013 Nov 7;93(5):840-51. doi: 10.1016/j.ajhg.2013.09.014. Epub 2013 Oct 24.

DOI:10.1016/j.ajhg.2013.09.014

PMID:24207118

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3824133/

Abstract

Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.

摘要

现有的通过血缘关系进行身份鉴定（IBD）片段检测方法是专为 SNP 芯片数据设计的，而不是序列数据。序列数据具有更高的遗传变异密度和不同的等位基因频率分布，并且可能具有更高的基因型错误率。因此，SNP 芯片数据中 IBD 检测的最佳实践不一定适用于序列数据。我们提出了一种用于检测序列数据中 IBD 片段的方法 IBDseq，以及一种通过检测到的 IBD 来估计低频变异基因型错误率的方法 SEQERR。IBDseq 方法为每个 IBD 和非 IBD 模型下的个体对估计观察到的具有错误的基因型的概率。两个模型下的估计概率之比为 IBD 提供了 LOD 得分。我们评估了几种在多种参数设置下足够快适用于序列数据的 IBD 检测方法（IBDseq、Beagle Refined IBD、PLINK 和 GERMLINE），并表明 IBDseq 可实现序列数据中 IBD 检测的高功效和准确性。SEQERR 方法通过比较 IBD 片段中低频变异的同型和杂合基因型的观察到的和预期的比率来估计基因型错误率。我们在模拟数据中证明了 SEQERR 的准确性，并将该方法应用于 UK10K 和 1000 基因组计划序列数据中估计基因型错误率。

相似文献

Detecting identity by descent and estimating genotype error rates in sequence data.通过序列数据检测血缘身份并估计基因型错误率。

Am J Hum Genet. 2013 Nov 7;93(5):840-51. doi: 10.1016/j.ajhg.2013.09.014. Epub 2013 Oct 24.

A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.一种在大规模数据中快速简单检测同源片段的方法。

Am J Hum Genet. 2020 Apr 2;106(4):426-437. doi: 10.1016/j.ajhg.2020.02.010. Epub 2020 Mar 12.

Detection of identity by descent using next-generation whole genome sequencing data.利用下一代全基因组测序数据进行血统身份检测。

BMC Bioinformatics. 2012 Jun 6;13:121. doi: 10.1186/1471-2105-13-121.

High-resolution detection of identity by descent in unrelated individuals.高分辨率检测无关个体间的血缘关系。

Am J Hum Genet. 2010 Apr 9;86(4):526-39. doi: 10.1016/j.ajhg.2010.02.021. Epub 2010 Mar 18.

A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data.一种在全基因组SNP数据中检测IBD共享单倍型的快速准确方法。

Eur J Hum Genet. 2017 May;25(5):617-624. doi: 10.1038/ejhg.2017.6. Epub 2017 Feb 8.

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.快速、无相位的长同源片段检测可实现有效的关系分类。

Am J Hum Genet. 2020 Apr 2;106(4):453-466. doi: 10.1016/j.ajhg.2020.02.012. Epub 2020 Mar 19.

Improving the accuracy and efficiency of identity-by-descent detection in population data.提高群体数据中基于关联的身份检测的准确性和效率。

Genetics. 2013 Jun;194(2):459-71. doi: 10.1534/genetics.113.150029. Epub 2013 Mar 27.

Estimating the degree of identity by descent in consanguineous couples.估算同血缘夫妇的血缘相关度。

Hum Mutat. 2011 Dec;32(12):1350-8. doi: 10.1002/humu.21584. Epub 2011 Sep 23.

Relationship estimation from whole-genome sequence data.全基因组序列数据的关系估计。

PLoS Genet. 2014 Jan 30;10(1):e1004144. doi: 10.1371/journal.pgen.1004144. eCollection 2014 Jan.

Using identity by descent estimation with dense genotype data to detect positive selection.利用高密度基因型数据的血统估计来检测正选择。

Eur J Hum Genet. 2013 Feb;21(2):205-11. doi: 10.1038/ejhg.2012.148. Epub 2012 Jul 11.

引用本文的文献

A likelihood ratio framework for inferring close kinship from dynamically selected SNPs.一种用于从动态选择的单核苷酸多态性推断近亲关系的似然比框架。

Front Genet. 2025 Jul 23;16:1635734. doi: 10.3389/fgene.2025.1635734. eCollection 2025.

Identity-By-Descent Mapping Using Multi-Individual IBD With Genome-Wide Multiple Testing Adjustment.使用多个体同源性检测并进行全基因组多重检验校正的同源性映射

Genet Epidemiol. 2025 Sep;49(6):e70015. doi: 10.1002/gepi.70015.

The genomic footprints of migration: how ancient DNA reveals our history of mobility.迁徙的基因组印记：古代DNA如何揭示我们的迁徙历史。

Genome Biol. 2025 Jul 16;26(1):206. doi: 10.1186/s13059-025-03664-w.

Neanderthal introgressed ancestry reveals human genomic regions enriched with recessive deleterious mutations.尼安德特人基因渗入的祖先揭示了富含隐性有害突变的人类基因组区域。

bioRxiv. 2025 May 7:2025.05.07.652751. doi: 10.1101/2025.05.07.652751.

Picuris Pueblo oral history and genomics reveal continuity in US Southwest.皮库里斯拉霍亚的口述历史与基因组学揭示了美国西南部的延续性。

Nature. 2025 Apr 30. doi: 10.1038/s41586-025-08791-9.

The population genetics of convergent adaptation in maize and teosinte is not locally restricted.玉米和大刍草趋同适应的群体遗传学并非局限于局部。

Elife. 2025 Feb 13;12:RP92405. doi: 10.7554/eLife.92405.

Estimating effective population size trajectories from time-series identity-by-descent segments.通过按血统相同片段的时间序列估计有效种群大小轨迹。

Genetics. 2025 Mar 17;229(3). doi: 10.1093/genetics/iyae212.

Ancient genomics support deep divergence between Eastern and Western Mediterranean Indo-European languages.古代基因组学研究为东地中海和西地中海印欧语系之间的深度分化提供了支持。

bioRxiv. 2024 Dec 2:2024.12.02.626332. doi: 10.1101/2024.12.02.626332.

The genomic portrait of the Picene culture provides new insights into the Italic Iron Age and the legacy of the Roman Empire in Central Italy.皮切纳文化的基因组特征为研究意大利铁器时代和罗马帝国在意大利中部的遗产提供了新的见解。

Genome Biol. 2024 Nov 21;25(1):292. doi: 10.1186/s13059-024-03430-4.

Sex chromosome turnover in hybridizing stickleback lineages.杂交棘鱼谱系中的性染色体更替

Evol Lett. 2024 May 11;8(5):658-668. doi: 10.1093/evlett/qrae019. eCollection 2024 Sep.

本文引用的文献

Improving the accuracy and efficiency of identity-by-descent detection in population data.提高群体数据中基于关联的身份检测的准确性和效率。

Genetics. 2013 Jun;194(2):459-71. doi: 10.1534/genetics.113.150029. Epub 2013 Mar 27.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Length distributions of identity by descent reveal fine-scale demographic history.亲缘关系一致的长度分布揭示了精细的人口历史。

Am J Hum Genet. 2012 Nov 2;91(5):809-22. doi: 10.1016/j.ajhg.2012.08.030. Epub 2012 Oct 25.

Detecting identity by descent and homozygosity mapping in whole-exome sequencing data.通过全外显子组测序数据中的血缘关系和纯合性映射来检测身份。

PLoS One. 2012;7(10):e47618. doi: 10.1371/journal.pone.0047618. Epub 2012 Oct 11.

Identity by descent between distant relatives: detection and applications.远亲间的血缘关系鉴定：检测与应用。

Annu Rev Genet. 2012;46:617-33. doi: 10.1146/annurev-genet-110711-155534. Epub 2012 Sep 17.

Detection of identity by descent using next-generation whole genome sequencing data.利用下一代全基因组测序数据进行血统身份检测。

BMC Bioinformatics. 2012 Jun 6;13:121. doi: 10.1186/1471-2105-13-121.

A fast, powerful method for detecting identity by descent.一种快速、强大的通过血缘关系进行身份检测的方法。

Am J Hum Genet. 2011 Feb 11;88(2):173-82. doi: 10.1016/j.ajhg.2011.01.010.

Deep resequencing reveals excess rare recent variants consistent with explosive population growth.深度重测序揭示了与人口爆炸式增长相一致的过量罕见近期变异。

Nat Commun. 2010 Nov 30;1:131. doi: 10.1038/ncomms1130.

DendroPy: a Python library for phylogenetic computing.DendroPy：一个用于系统发育计算的 Python 库。

Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.

A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.一种用于下一代全基因组关联研究的灵活且准确的基因型填充方法。

PLoS Genet. 2009 Jun;5(6):e1000529. doi: 10.1371/journal.pgen.1000529. Epub 2009 Jun 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验