Department of Agronomy, Iowa State University, Ames, IA 50011, USA.
Department of Statistics, Iowa State University, Ames, IA 50011, USA.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac729.
Genotyping by sequencing is a powerful tool for investigating genetic variation in plants, but many economically important plants are allopolyploids, where homoeologous similarity obscures the subgenomic origin of reads and confounds allelic and homoeologous SNPs. Recent polyploid genotyping methods use allelic frequencies, rate of heterozygosity, parental cross or other information to resolve read assignment, but good subgenomic references offer the most direct information. The typical strategy aligns reads to the joint reference, performs diploid genotyping within each subgenome, and filters the results, but persistent read misassignment results in an excess of false heterozygous calls.
We introduce the Comprehensive Allopolyploid Genotyper (CAPG), which formulates an explicit likelihood to weight read alignments against both subgenomic references and genotype individual allopolyploids from whole-genome resequencing data. We demonstrate CAPG in allotetraploids, where it performs better than Genome Analysis Toolkit's HaplotypeCaller applied to reads aligned to the combined subgenomic references.
Code and tutorials are available at https://github.com/Kkulkarni1/CAPG.git.
Supplementary data are available at Bioinformatics online.
测序基因分型是研究植物遗传变异的有力工具,但许多具有经济重要性的植物都是异源多倍体,其中同源相似性掩盖了读取的亚基因组起源,并混淆了等位基因和同源 SNP。最近的多倍体基因分型方法使用等位基因频率、杂合率、亲本杂交或其他信息来解决读取分配问题,但良好的亚基因组参考提供了最直接的信息。典型的策略是将读取与联合参考对齐,在每个亚基因组内执行二倍体基因分型,并过滤结果,但持续的读取错误分配会导致假杂合调用过多。
我们引入了综合异源多倍体基因分型器 (CAPG),它针对亚基因组参考和从全基因组重测序数据中个体异源多倍体的基因型制定了显式似然性,以对读取进行加权。我们在异源四倍体中展示了 CAPG,它比应用于对齐到组合亚基因组参考的读取的基因组分析工具包的 HaplotypeCaller 表现更好。
代码和教程可在 https://github.com/Kkulkarni1/CAPG.git 获得。
补充数据可在生物信息学在线获得。