在汇集实验中使用反馈，并通过插补来提高基因分型准确性，同时降低成本。

Using feedback in pooled experiments augmented with imputation for high genotyping accuracy at reduced cost.

作者信息

Clouard Camille, Nettelblad Carl

机构信息

Division of Scientific Computing, Department of Information Technology, Uppsala University, Uppsala SE-751 05, Sweden.

SciLifeLab, Science for Life Laboratory, Uppsala University, Uppsala SE-751 05, Sweden.

出版信息

G3 (Bethesda). 2025 Mar 18;15(3). doi: 10.1093/g3journal/jkaf010.

DOI:10.1093/g3journal/jkaf010

PMID:39847531

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11917477/

Abstract

Conducting genomic selection (GS) in plant breeding programs can substantially speed up the development of new varieties. GS provides more reliable insights when it is based on dense marker data, in which the rare variants can be particularly informative. Despite the availability of new technologies, the cost of large-scale genotyping remains a major limitation to the implementation of GS. We suggest to combine pooled genotyping with population-based imputation as a cost-effective computational strategy for genotyping SNPs. Pooling saves genotyping tests and has proven to accurately capture the rare variants that are usually missed by imputation. In this study, we investigate adding iterative coupling to a joint model of pooling and imputation that we have previously proposed. In each iteration, the imputed genotype probabilities serve as feedback input for adjusting the per-sample prior genotype probabilities, before running a new imputation based on these adjusted data. This flexible setup indirectly imposes consistency between the imputed genotypes and the pooled observations. We demonstrate that repeated cycles of feedback can take advantage of the strengths in both pooling and imputation when an appropriate set of reference haplotypes is available for imputation. The iterations improve greatly upon the initial genotype predictions, achieving very high genotype accuracy for both low- and high-frequency variants. We enhance the average concordance from 94.5% to 98.4% at limited computational cost and without requiring any additional genotype testing.

摘要

在植物育种计划中进行基因组选择（GS）可以大幅加快新品种的培育。当基于密集标记数据进行GS时，能提供更可靠的见解，其中稀有变异可能特别有价值。尽管有新技术可用，但大规模基因分型的成本仍然是GS实施的主要限制。我们建议将混合基因分型与基于群体的插补相结合，作为一种具有成本效益的单核苷酸多态性（SNP）基因分型计算策略。混合节省了基因分型测试，并已证明能准确捕获通常被插补遗漏的稀有变异。在本研究中，我们研究在我们之前提出的混合与插补联合模型中加入迭代耦合。在每次迭代中，在基于这些调整后的数据进行新的插补之前，插补的基因型概率用作反馈输入，用于调整每个样本的先验基因型概率。这种灵活的设置间接在插补基因型和混合观测值之间建立了一致性。我们证明，当有一组合适的参考单倍型可用于插补时，重复的反馈循环可以利用混合和插补两者的优势。这些迭代极大地改进了初始基因型预测，对于低频和高频变异都实现了非常高的基因型准确性。我们以有限的计算成本将平均一致性从94.5%提高到98.4%，且无需任何额外的基因型测试。