Fresh Forward Breeding B.V., Huissen, The Netherlands.
Wageningen University and Research Plant Breeding, Wageningen, The Netherlands.
BMC Genomics. 2024 Nov 28;25(1):1150. doi: 10.1186/s12864-024-10987-8.
The allo-octoploid Fragaria x ananassa follows disomic inheritance, yet the high sequence similarity among its subgenomes can lead to misalignment of short sequencing reads (150 bp). This misalignment results in an increased number of erroneous variants during variant calling. To accurately associate traits with the appropriate subgenome, it is essential to filter out these erroneous variants. By classifying variants into correct (type 1) and erroneous types (homoeologous variants-type 2, and multi-locus variants-type 3), we can improve the reliability of downstream analyses.
Our analysis reveals that while erroneous variant types often display skewed average allele balances (AAB) for heterozygous calls, this measure alone is insufficient. To mitigate the erroneous variants further, we employed a Linkage Disequilibrium (LD) based filtering method that correlates highly (99%) with an approach that utilizes a genetic map from a biparental population. This combined filtering strategy-using both LD-based and average allele balance methods-resulted in the lowest switch error rate (0.037). Notably, our best filtering approach decreased phasing switch error rates by 44% and preserved 72% of the original dataset.
The results indicate that identifying erroneous variants due to subgenome similarity can be effectively achieved without extensive genotyping of mapping populations. By implementing the LD-based filtering method, the phasing accuracy improved which improves the tracability of important alleles in the germplasm, paving the way for better understanding of trait associations in F. x ananassa.
异源八倍体草莓(Fragaria x ananassa)遵循二倍体遗传,但亚基因组间的高度序列相似性会导致短测序reads(150bp)的错配。这种错配会导致在变异调用过程中产生更多错误的变异。为了准确地将性状与相应的亚基因组关联起来,过滤掉这些错误的变异是至关重要的。通过将变异分为正确的(类型 1)和错误的类型(同源变异-类型 2,和多位点变异-类型 3),我们可以提高下游分析的可靠性。
我们的分析表明,虽然错误变异类型通常显示杂合子调用的偏斜平均等位基因平衡(AAB),但仅这一指标是不够的。为了进一步减少错误变异,我们采用了基于连锁不平衡(LD)的过滤方法,该方法与利用双亲种群遗传图谱的方法高度相关(99%)。这种基于 LD 和平均等位基因平衡方法的联合过滤策略导致最低的开关错误率(0.037)。值得注意的是,我们最好的过滤方法将相位开关错误率降低了 44%,并保留了原始数据集的 72%。
结果表明,通过实施基于 LD 的过滤方法,在不需要对作图群体进行广泛基因分型的情况下,可以有效地识别由于亚基因组相似性导致的错误变异。这提高了相位准确性,提高了种质中重要等位基因的可追踪性,为更好地理解草莓 F. x ananassa 中的性状关联铺平了道路。