Suppr超能文献

基于家系的单体型相位准确性估计会受到基因型错误的偏倚。

Genotype error biases trio-based estimates of haplotype phase accuracy.

机构信息

Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

出版信息

Am J Hum Genet. 2022 Jun 2;109(6):1016-1025. doi: 10.1016/j.ajhg.2022.04.019.

Abstract

Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10 vs 5.8 × 10. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.

摘要

单体型可以通过统计方法从未相位基因型数据中估计。当有父母-子女三体型可用于根据孟德尔遗传规律推断真实相位时,统计相位的准确性通常通过转换错误率来衡量,即连续杂合子错误相位的比例。我们提出了一种从父母-子女三体型估计基因型错误率的方法,以及一种估计由于基因型错误而导致观察到的转换错误率中出现偏差的方法。我们将这些方法应用于包含 898 对英国生物库白种人三体型和 38387 个测序的 TOPMed 样本的 485301 个已分型 UK Biobank 样本,其中包括 217 对非裔加勒比三体型和 669 对欧洲裔三体型。我们表明,基因型错误会增加观察到的转换错误率,并且相对偏差随样本量增加而增加。对于英国生物库白种人三体型,三体型后代中的观察到的转换错误率是估计的真实转换错误率的 2.4 倍(1.4×10 比 5.8×10)。我们提出了一种替代的相位错误定义,将两个连续的转换错误计为单个错误,因为当单个杂合子相对于周围的杂合子错误相位时,会出现连续的转换错误。根据这个定义,我们估计在英国生物库白种人个体中,相位错误的平均距离为 6400 万个碱基对。

相似文献

2
Statistical phasing of 150,119 sequenced genomes in the UK Biobank.英国生物库中 150119 个测序基因组的统计相位。
Am J Hum Genet. 2023 Jan 5;110(1):161-165. doi: 10.1016/j.ajhg.2022.11.008. Epub 2022 Nov 29.
3
Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent.利用三亲同缘关系估计全基因组突变率。
Am J Hum Genet. 2019 Nov 7;105(5):883-893. doi: 10.1016/j.ajhg.2019.09.012. Epub 2019 Oct 3.
9
Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.

引用本文的文献

本文引用的文献

1
Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.
6
Population-Specific Recombination Maps from Segments of Identity by Descent.基于血缘同一性片段的特定人群重组图谱。
Am J Hum Genet. 2020 Jul 2;107(1):137-148. doi: 10.1016/j.ajhg.2020.05.016. Epub 2020 Jun 12.
9
Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent.利用三亲同缘关系估计全基因组突变率。
Am J Hum Genet. 2019 Nov 7;105(5):883-893. doi: 10.1016/j.ajhg.2019.09.012. Epub 2019 Oct 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验