Madrigal Giovanni, Minhas Bushra Fazal, Catchen Julian
Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Informatics Program, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Mol Ecol Resour. 2025 Jan;25(1):e13982. doi: 10.1111/1755-0998.13982. Epub 2024 May 27.
The improvement and decreasing costs of third-generation sequencing technologies has widened the scope of biological questions researchers can address with de novo genome assemblies. With the increasing number of reference genomes, validating their integrity with minimal overhead is vital for establishing confident results in their applications. Here, we present Klumpy, a tool for detecting and visualizing both misassembled regions in a genome assembly and genetic elements (e.g. genes) of interest in a set of sequences. By leveraging the initial raw reads in combination with their respective genome assembly, we illustrate Klumpy's utility by investigating antifreeze glycoprotein (afgp) loci across two icefishes, by searching for a reported absent gene in the northern snakehead fish, and by scanning the reference genomes of a mudskipper and bumblebee for misassembled regions. In the two former cases, we were able to provide support for the noncanonical placement of an afgp locus in the icefishes and locate the missing snakehead gene. Furthermore, our genome scans were able identify an unmappable locus in the mudskipper reference genome and identify a putative repetitive element shared among several species of bees.
第三代测序技术的改进和成本降低,拓宽了研究人员通过从头基因组组装来解决生物学问题的范围。随着参考基因组数量的增加,以最小的工作量验证其完整性对于在其应用中获得可靠结果至关重要。在此,我们展示了Klumpy,这是一种用于检测和可视化基因组组装中错误组装区域以及一组序列中感兴趣的遗传元件(如基因)的工具。通过结合初始原始读数及其各自的基因组组装,我们通过研究两种南极鱼中的抗冻糖蛋白(afgp)基因座、在北方黑鱼中寻找一个报道中缺失的基因以及扫描弹涂鱼和大黄蜂的参考基因组以查找错误组装区域,来说明Klumpy的实用性。在前两种情况下,我们能够为南极鱼中afgp基因座的非典型定位提供支持,并找到缺失的黑鱼基因。此外,我们的基因组扫描能够在弹涂鱼参考基因组中识别一个无法映射的基因座,并识别几种蜜蜂物种之间共有的一个假定重复元件。