Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2022 Jun 2;109(6):1016-1025. doi: 10.1016/j.ajhg.2022.04.019.
Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10 vs 5.8 × 10. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.
单体型可以通过统计方法从未相位基因型数据中估计。当有父母-子女三体型可用于根据孟德尔遗传规律推断真实相位时,统计相位的准确性通常通过转换错误率来衡量,即连续杂合子错误相位的比例。我们提出了一种从父母-子女三体型估计基因型错误率的方法,以及一种估计由于基因型错误而导致观察到的转换错误率中出现偏差的方法。我们将这些方法应用于包含 898 对英国生物库白种人三体型和 38387 个测序的 TOPMed 样本的 485301 个已分型 UK Biobank 样本,其中包括 217 对非裔加勒比三体型和 669 对欧洲裔三体型。我们表明,基因型错误会增加观察到的转换错误率,并且相对偏差随样本量增加而增加。对于英国生物库白种人三体型,三体型后代中的观察到的转换错误率是估计的真实转换错误率的 2.4 倍(1.4×10 比 5.8×10)。我们提出了一种替代的相位错误定义,将两个连续的转换错误计为单个错误,因为当单个杂合子相对于周围的杂合子错误相位时,会出现连续的转换错误。根据这个定义,我们估计在英国生物库白种人个体中,相位错误的平均距离为 6400 万个碱基对。