Department of Biology, Duke University.
Beijing Genomics Institute-Qingdao, China.
Genome Biol Evol. 2020 Jul 1;12(7):1080-1086. doi: 10.1093/gbe/evaa101.
Lytechinus variegatus is a camarodont sea urchin found widely throughout the western Atlantic Ocean in a variety of shallow-water marine habitats. Its distribution, abundance, and amenability to developmental perturbation make it a popular model for ecologists and developmental biologists. Here, we present a chromosomal-level genome assembly of L. variegatus generated from a combination of PacBio long reads, 10× Genomics sequencing, and HiC chromatin interaction sequencing. We show L. variegatus has 19 chromosomes with an assembly size of 870.4 Mb. The contiguity and completeness of this assembly are reflected by a scaffold length N50 of 45.5 Mb and BUSCO completeness score of 95.5%. Ab initio and transcript-informed gene modeling and annotation identified 27,232 genes with an average gene length of 12.6 kb, comprising an estimated 39.5% of the genome. Repetitive regions, on the other hand, make up 45.4% of the genome. Physical mapping of well-studied developmental genes onto each chromosome reveals nonrandom spatial distribution of distinct genes and gene families, which provides insight into how certain gene families may have evolved and are transcriptionally regulated in this species. Lastly, aligning RNA-seq and ATAC-seq data onto this assembly demonstrates the value of highly contiguous, complete genome assemblies for functional genomics analyses that is unattainable with fragmented, incomplete assemblies. This genome will be of great value to the scientific community as a resource for genome evolution, developmental, and ecological studies of this species and the Echinodermata.
杂色刻肋海胆是一种广布于西大西洋的滨岸海胆,栖息于多种浅海生境中。其分布范围广、数量多、易于受到发育干扰,使其成为生态学家和发育生物学家的热门模式生物。本文中,我们通过 PacBio 长读长、10× Genomics 测序和 HiC 染色质互作测序相结合,提供了杂色刻肋海胆的染色体水平基因组组装结果。结果显示,杂色刻肋海胆有 19 条染色体,组装大小为 870.4 Mb。该组装的连续性和完整性体现在其 scaffolds N50 为 45.5 Mb 和 BUSCO 完整性评分为 95.5%。从头预测和基于转录本的基因建模和注释共鉴定出 27232 个基因,平均长度为 12.6 kb,估计占基因组的 39.5%。另一方面,重复序列占基因组的 45.4%。将发育相关基因的研究较为透彻的基因物理定位到每条染色体上,揭示了不同基因和基因家族的非随机空间分布,这为某些基因家族在该物种中的进化和转录调控方式提供了线索。最后,将 RNA-seq 和 ATAC-seq 数据与该组装结果进行比对,表明高度连续、完整的基因组组装对于功能基因组分析具有重要价值,而这是利用不完整的、碎片化的组装所无法实现的。该基因组将成为科学界的宝贵资源,可用于研究该物种和棘皮动物的基因组进化、发育和生态。