Hu Ming, Wan Penglong, Chen Chengjie, Tang Shuyuan, Chen Jiahao, Wang Liang, Chakraborty Mahul, Zhou Yongfeng, Chen Jinfeng, Gaut Brandon S, Emerson J J, Liao Yi
Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (South China), Ministry of Agriculture and Rural Affairs, College of Horticulture, South China Agricultural University, Guangdong 510642, China.
These authors contributed equally to this work.
bioRxiv. 2025 Feb 8:2025.02.07.637096. doi: 10.1101/2025.02.07.637096.
Comparisons of complete genome assemblies offer a direct procedure for characterizing all genetic differences among them. However, existing tools are often limited to specific aligners or optimized for specific organisms, narrowing their applicability, particularly for large and repetitive plant genomes. Here, we introduce SVGAP, a pipeline for structural variant (SV) discovery, genotyping, and annotation from high-quality genome assemblies at the population level. Through extensive benchmarks using simulated SV datasets at individual, population, and phylogenetic contexts, we demonstrate that SVGAP performs favorably relative to existing tools in SV discovery. Additionally, SVGAP is one of the few tools to address the challenge of genotyping SVs within large assembled genome samples, and it generates fully genotyped VCF files. Applying SVGAP to 26 maize genomes revealed hidden genomic diversity in centromeres, driven by abundant insertions of centromere-specific LTR-retrotransposons. The output of SVGAP is well-suited for pan-genome construction and facilitates the interpretation of previously unexplored genomic regions.
完整基因组组装的比较提供了一种直接的方法来表征它们之间所有的遗传差异。然而,现有的工具通常局限于特定的比对器,或者针对特定生物进行了优化,这限制了它们的适用性,特别是对于庞大且重复的植物基因组。在这里,我们介绍了SVGAP,这是一种用于在群体水平上从高质量基因组组装中发现结构变异(SV)、进行基因分型和注释的流程。通过在个体、群体和系统发育背景下使用模拟SV数据集进行的广泛基准测试,我们证明SVGAP在SV发现方面相对于现有工具表现出色。此外,SVGAP是应对在大型组装基因组样本中对SV进行基因分型挑战的少数工具之一,并且它能生成完全基因分型的VCF文件。将SVGAP应用于26个玉米基因组,揭示了着丝粒中隐藏的基因组多样性,这是由着丝粒特异性LTR反转录转座子的大量插入驱动的。SVGAP的输出非常适合泛基因组构建,并有助于解释以前未探索的基因组区域。