基因分型错误对基于小麦（Triticum aestivum L.）两个重组自交系群体重复芯片分析构建连锁图谱的影响。

Effect of genotyping errors on linkage map construction based on repeated chip analysis of two recombinant inbred line populations in wheat (Triticum aestivum L.).

作者信息

Wang Xinru, Wang Jiankang, Xia Xianchun, Xu Xiaowan, Li Lingli, Cao Shuanghe, Hao Yuanfeng, Zhang Luyan

机构信息

State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100081, China.

出版信息

BMC Plant Biol. 2024 Apr 22;24(1):306. doi: 10.1186/s12870-024-05005-8.

DOI:10.1186/s12870-024-05005-8

PMID:38644480

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11034145/

Abstract

Linkage maps are essential for genetic mapping of phenotypic traits, gene map-based cloning, and marker-assisted selection in breeding applications. Construction of a high-quality saturated map requires high-quality genotypic data on a large number of molecular markers. Errors in genotyping cannot be completely avoided, no matter what platform is used. When genotyping error reaches a threshold level, it will seriously affect the accuracy of the constructed map and the reliability of consequent genetic studies. In this study, repeated genotyping of two recombinant inbred line (RIL) populations derived from crosses Yangxiaomai × Zhongyou 9507 and Jingshuang 16 × Bainong 64 was used to investigate the effect of genotyping errors on linkage map construction. Inconsistent data points between the two replications were regarded as genotyping errors, which were classified into three types. Genotyping errors were treated as missing values, and therefore the non-erroneous data set was generated. Firstly, linkage maps were constructed using the two replicates as well as the non-erroneous data set. Secondly, error correction methods implemented in software packages QTL IciMapping (EC) and Genotype-Corrector (GC) were applied to the two replicates. Linkage maps were therefore constructed based on the corrected genotypes and then compared with those from the non-erroneous data set. Simulation study was performed by considering different levels of genotyping errors to investigate the impact of errors and the accuracy of error correction methods. Results indicated that map length and marker order differed among the two replicates and the non-erroneous data sets in both RIL populations. For both actual and simulated populations, map length was expanded as the increase in error rate, and the correlation coefficient between linkage and physical maps became lower. Map quality can be improved by repeated genotyping and error correction algorithm. When it is impossible to genotype the whole mapping population repeatedly, 30% would be recommended in repeated genotyping. The EC method had a much lower false positive rate than did the GC method under different error rates. This study systematically expounded the impact of genotyping errors on linkage analysis, providing potential guidelines for improving the accuracy of linkage maps in the presence of genotyping errors.

摘要

连锁图谱对于表型性状的遗传定位、基于基因图谱的克隆以及育种应用中的标记辅助选择至关重要。构建高质量的饱和图谱需要大量分子标记的高质量基因型数据。无论使用何种平台，基因分型中的错误都无法完全避免。当基因分型错误达到阈值水平时，将严重影响构建图谱的准确性以及后续遗传研究的可靠性。在本研究中，对源自杂交组合扬麦×中优9507和京双16×百农64的两个重组自交系（RIL）群体进行重复基因分型，以研究基因分型错误对连锁图谱构建的影响。将两次重复之间不一致的数据点视为基因分型错误，这些错误分为三种类型。基因分型错误被视为缺失值，从而生成无错误数据集。首先，使用两次重复以及无错误数据集构建连锁图谱。其次，将软件包QTL IciMapping（EC）和Genotype-Corrector（GC）中实现的错误校正方法应用于两次重复。因此，基于校正后的基因型构建连锁图谱，然后与无错误数据集构建的图谱进行比较。通过考虑不同水平的基因分型错误进行模拟研究，以调查错误的影响以及错误校正方法的准确性。结果表明，在两个RIL群体中，两次重复和无错误数据集之间的图谱长度和标记顺序存在差异。对于实际群体和模拟群体，图谱长度均随着错误率的增加而扩展，连锁图谱与物理图谱之间的相关系数变得更低。通过重复基因分型和错误校正算法可以提高图谱质量。当无法对整个作图群体进行重复基因分型时，建议重复基因分型的比例为30%。在不同错误率下，EC方法的假阳性率远低于GC方法。本研究系统阐述了基因分型错误对连锁分析的影响，为在存在基因分型错误的情况下提高连锁图谱的准确性提供了潜在指导。