Guiglielmoni Nadège, Villegas Laura I, Kirangwa Joseph, Schiffer Philipp H
Institut für Zoologie, Universität zu Köln, Cologne, Germany.
Front Genet. 2024 Feb 7;15:1308527. doi: 10.3389/fgene.2024.1308527. eCollection 2024.
High-quality genomes obtained using long-read data allow not only for a better understanding of heterozygosity levels, repeat content, and more accurate gene annotation and prediction when compared to those obtained with short-read technologies, but also allow to understand haplotype divergence. Advances in long-read sequencing technologies in the last years have made it possible to produce such high-quality assemblies for non-model organisms. This allows us to revisit genomes, which have been problematic to scaffold to chromosome-scale with previous generations of data and assembly software. Nematoda, one of the most diverse and speciose animal phyla within metazoans, remains poorly studied, and many previously assembled genomes are fragmented. Using long reads obtained with Nanopore R10.4.1 and PacBio HiFi, we generated highly contiguous assemblies of a diploid nematode of the Mermithidae family, for which no closely related genomes are available to date, as well as a collapsed assembly and a phased assembly for a triploid nematode from the Panagrolaimidae family. Both genomes had been analysed before, but the fragmented assemblies had scaffold sizes comparable to the length of long reads prior to assembly. Our new assemblies illustrate how long-read technologies allow for a much better representation of species genomes. We are now able to conduct more accurate downstream assays based on more complete gene and transposable element predictions.
与短读长技术获得的基因组相比,使用长读长数据获得的高质量基因组不仅能更好地了解杂合性水平、重复序列含量,进行更准确的基因注释和预测,还能了解单倍型差异。近年来长读长测序技术的进步使得为非模式生物生成此类高质量基因组组装成为可能。这使我们能够重新审视那些使用上一代数据和组装软件难以构建到染色体规模的基因组。线虫纲是后生动物中最多样化、物种最丰富的动物门类之一,目前对其研究较少,许多之前组装的基因组都是碎片化的。我们使用通过纳米孔R10.4.1和PacBio HiFi获得的长读长数据,生成了丝虫科一种二倍体线虫的高度连续组装基因组(目前尚无与之密切相关的基因组),以及小杆线虫科一种三倍体线虫的压缩组装基因组和定相组装基因组。这两个基因组之前都已被分析过,但之前的碎片化组装基因组的支架大小与组装前长读长的长度相当。我们新的组装基因组展示了长读长技术如何能更好地呈现物种基因组。现在我们能够基于更完整的基因和转座元件预测进行更准确的下游分析。