Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, MD 21202, USA.
Horn Point Laboratory, University of Maryland Center for Environmental Science, Horn Point, MD 21613, USA.
G3 (Bethesda). 2021 Sep 6;11(9). doi: 10.1093/g3journal/jkab212.
The blue crab, Callinectes sapidus (Rathbun, 1896) is an economically, culturally, and ecologically important species found across the temperate and tropical North and South American Atlantic coast. A reference genome will enable research for this high-value species. Initial assembly combined 200× coverage Illumina paired-end reads, a 60× 8 kb mate-paired library, and 50× PacBio data using the MaSuRCA assembler resulting in a 985 Mb assembly with a scaffold N50 of 153 kb. Dovetail Chicago and HiC sequencing with the 3d DNA assembler and Juicebox assembly tools were then used for chromosome scaffolding. The 50 largest scaffolds span 810 Mb are 1.5-37 Mb long and have a repeat content of 36%. The 190 Mb unplaced sequence is in 3921 sequences over 10 kb with a repeat content of 68%. The final assembly N50 is 18.9 Mb for scaffolds and 9317 bases for contigs. Of arthropod BUSCO, ∼88% (888/1013) were complete and single copies. Using 309 million RNAseq read pairs from 12 different tissues and developmental stages, 25,249 protein-coding genes were predicted. Between C. sapidus and Portunus trituberculatus genomes, 41 of 50 large scaffolds had high nucleotide identity and protein-coding synteny, but 9 scaffolds in both assemblies were not clear matches. The protein-coding genes included 9423 one-to-one putative orthologs, of which 7165 were syntenic between the two crab species. Overall, the two crab genome assemblies show strong similarities at the nucleotide, protein, and chromosome level and verify the blue crab genome as an excellent reference for this important seafood species.
蓝蟹(Callinectes sapidus)是一种具有经济、文化和生态重要性的物种,分布于北美和南美地区的温带和热带沿大西洋的海岸。该物种的参考基因组将有助于对其进行研究。最初的组装结合了 200×覆盖度的 Illumina 配对末端reads、60×8kb 配对文库和 50×PacBio 数据,使用 MaSuRCA 组装器生成了 985Mb 的组装体,支架 N50 为 153kb。然后使用 Dovetail Chicago 和 HiC 测序以及 3d DNA 组装器和 Juicebox 组装工具进行染色体支架搭建。50 个最大的支架跨度 810Mb,长度为 1.5-37Mb,重复含量为 36%。未定位的 190Mb 序列由 3921 个长度超过 10kb 的序列组成,重复含量为 68%。最终组装体的 N50 为支架的 18.9Mb 和片段的 9317 个碱基。节肢动物 BUSCO 中,约 88%(888/1013)是完整的和单拷贝的。使用 12 种不同组织和发育阶段的 3.09 亿个 RNAseq 读对,预测了 25249 个蛋白质编码基因。在 C.sapidus 和 Portunus trituberculatus 基因组之间,50 个大支架中有 41 个具有高核苷酸同一性和蛋白质编码基因的共线性,但两个组装体中都有 9 个支架无法明确匹配。蛋白质编码基因包括 9423 个一对一的假定直系同源基因,其中 7165 个在这两个蟹种之间具有共线性。总体而言,这两个蟹类基因组组装体在核苷酸、蛋白质和染色体水平上具有很强的相似性,证明蓝蟹基因组是该重要海鲜物种的优秀参考基因组。