Kraege Anton, Chavarro-Carrero Edgar, Schnell Eva, Heilmann-Heimbach Stefanie, Becker Kerstin, Köhrer Karl, Huettel Bruno, Sargheini Nafiseh, Schiffer Philipp, Waldvogel Ann-Marie, Thomma Bart P H J, Rovenich Hanna
Institute of Plant Sciences, Department of Biology, University of Cologne, Zülpicher Straße 47b, Cologne 50674, Germany.
Institute of Human Genetics, University Hospital of Bonn, University of Bonn, Venusberg, Sigmund-Freund, Straße 25, Bonn 53127, Germany.
G3 (Bethesda). 2025 Feb 5;15(2). doi: 10.1093/g3journal/jkae294.
Unicellular green algae of the genus Coccomyxa are recognized for their worldwide distribution and ecological versatility. Coccomyxa elongata is a freshwater species of the Coccomyxa simplex clade, which also includes lichen symbionts. To facilitate future molecular and phylogenomic studies of this versatile clade of algae, we generated a high-quality genome assembly for C. elongata Chodat & Jaag SAG 216-3b within the framework of the Biodiversity Genomics Center Cologne (BioC2) initiative. A combination of long-read PacBio HiFi and Oxford Nanopore Technologies with chromatin conformation capture (Hi-C) sequencing led to the assembly of the genome into 21 scaffolds with a total length of 51.4 Mb and an N50 of 2.8 Mb. Nineteen of the scaffolds represent highly complete nuclear chromosomes delimited by telomeric repeats, while the two additional scaffolds represent the mitochondrial and plastid genomes. Transcriptome-guided gene annotation resulted in the identification of 14,811 protein-coding genes, of which 61% have annotated protein family domains and 841 are predicted to be secreted. Benchmarking universal single-copy orthologs analysis against the Chlorophyta database identified a total of 1,494 (98.4%) complete gene models, suggesting a highly complete genome annotation.
球囊藻属的单细胞绿藻因其在全球的分布和生态多样性而闻名。长形球囊藻是简单球囊藻进化枝中的一种淡水物种,该进化枝还包括地衣共生体。为了便于未来对这种多功能藻类进化枝进行分子和系统基因组学研究,我们在科隆生物多样性基因组学中心(BioC2)倡议的框架内,为长形球囊藻Chodat & Jaag SAG 216-3b生成了高质量的基因组组装。长读长PacBio HiFi和牛津纳米孔技术与染色质构象捕获(Hi-C)测序相结合,将基因组组装成21个支架,总长度为51.4 Mb,N50为2.8 Mb。其中19个支架代表由端粒重复序列界定的高度完整的核染色体,而另外两个支架代表线粒体和质体基因组。转录组引导的基因注释导致鉴定出14811个蛋白质编码基因,其中61%具有注释的蛋白质家族结构域,预计有841个是分泌型的。针对绿藻数据库的基准通用单拷贝直系同源物分析共鉴定出1494个(98.4%)完整的基因模型,表明基因组注释高度完整。