Suppr超能文献

通过序列数据检测血缘身份并估计基因型错误率。

Detecting identity by descent and estimating genotype error rates in sequence data.

机构信息

Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.

出版信息

Am J Hum Genet. 2013 Nov 7;93(5):840-51. doi: 10.1016/j.ajhg.2013.09.014. Epub 2013 Oct 24.

Abstract

Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects.

摘要

现有的通过血缘关系进行身份鉴定(IBD)片段检测方法是专为 SNP 芯片数据设计的,而不是序列数据。序列数据具有更高的遗传变异密度和不同的等位基因频率分布,并且可能具有更高的基因型错误率。因此,SNP 芯片数据中 IBD 检测的最佳实践不一定适用于序列数据。我们提出了一种用于检测序列数据中 IBD 片段的方法 IBDseq,以及一种通过检测到的 IBD 来估计低频变异基因型错误率的方法 SEQERR。IBDseq 方法为每个 IBD 和非 IBD 模型下的个体对估计观察到的具有错误的基因型的概率。两个模型下的估计概率之比为 IBD 提供了 LOD 得分。我们评估了几种在多种参数设置下足够快适用于序列数据的 IBD 检测方法(IBDseq、Beagle Refined IBD、PLINK 和 GERMLINE),并表明 IBDseq 可实现序列数据中 IBD 检测的高功效和准确性。SEQERR 方法通过比较 IBD 片段中低频变异的同型和杂合基因型的观察到的和预期的比率来估计基因型错误率。我们在模拟数据中证明了 SEQERR 的准确性,并将该方法应用于 UK10K 和 1000 基因组计划序列数据中估计基因型错误率。

相似文献

4
High-resolution detection of identity by descent in unrelated individuals.高分辨率检测无关个体间的血缘关系。
Am J Hum Genet. 2010 Apr 9;86(4):526-39. doi: 10.1016/j.ajhg.2010.02.021. Epub 2010 Mar 18.
8
Estimating the degree of identity by descent in consanguineous couples.估算同血缘夫妇的血缘相关度。
Hum Mutat. 2011 Dec;32(12):1350-8. doi: 10.1002/humu.21584. Epub 2011 Sep 23.
9
Relationship estimation from whole-genome sequence data.全基因组序列数据的关系估计。
PLoS Genet. 2014 Jan 30;10(1):e1004144. doi: 10.1371/journal.pgen.1004144. eCollection 2014 Jan.

引用本文的文献

10
Sex chromosome turnover in hybridizing stickleback lineages.杂交棘鱼谱系中的性染色体更替
Evol Lett. 2024 May 11;8(5):658-668. doi: 10.1093/evlett/qrae019. eCollection 2024 Sep.

本文引用的文献

5
Identity by descent between distant relatives: detection and applications.远亲间的血缘关系鉴定:检测与应用。
Annu Rev Genet. 2012;46:617-33. doi: 10.1146/annurev-genet-110711-155534. Epub 2012 Sep 17.
9
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验