Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
Institute of Agrobiological Sciences, National Agriculture and Food Research Organization (NARO), 1-2 Owashi, Tsukuba, Ibaraki, 305-8634, Japan.
Insect Biochem Mol Biol. 2019 Apr;107:53-62. doi: 10.1016/j.ibmb.2019.02.002. Epub 2019 Feb 23.
In 2008, the genome assembly and gene models for the domestic silkworm, Bombyx mori, were published by a Japanese and Chinese collaboration group. However, the genome assembly contains a non-negligible number of misassembled and gap regions due to the presence of many repetitive sequences within the silkworm genome. The erroneous genome assembly occasionally causes incorrect gene prediction. Here we performed hybrid assembly based on 140 × deep sequencing of long (PacBio) and short (Illumina) reads. The remaining gaps in the initial genome assembly were closed using BAC and Fosmid sequences, giving a new total length of 460.3 Mb, with 30 gap regions and an N50 comprising 16.8 Mb in scaffolds and 12.2 Mb in contigs. More RNA-seq and piRNA-seq reads were mapped on the new genome assembly compared with the previous version, indicating that the new genome assembly covers more transcribed regions, including repetitive elements. We performed gene prediction based on the new genome assembly using available mRNA and protein sequence data. The number of gene models was 16,880 with an N50 of 2154 bp. The new gene models reflected more accurate coding sequences and gene sets than old ones. The proportion of repetitive elements was also reestimated using the new genome assembly, and was calculated to be 46.8% in the silkworm genome. The new genome assembly and gene models are provided in SilkBase (http://silkbase.ab.a.u-tokyo.ac.jp).
2008 年,一个由日本和中国合作团队发表了家蚕(Bombyx mori)的基因组组装和基因模型。然而,由于家蚕基因组中存在许多重复序列,基因组组装中存在相当数量的错误组装和缺口区域。错误的基因组组装偶尔会导致错误的基因预测。在这里,我们基于长(PacBio)和短(Illumina)读长的 140×深度测序进行了混合组装。使用 BAC 和 Fosmid 序列封闭了初始基因组组装中的剩余缺口,得到了一个新的总长度为 460.3 Mb 的基因组,有 30 个缺口区域,支架的 N50 为 16.8 Mb,串联的 N50 为 12.2 Mb。与以前的版本相比,更多的 RNA-seq 和 piRNA-seq reads 被映射到新的基因组组装上,表明新的基因组组装覆盖了更多的转录区域,包括重复元件。我们基于新的基因组组装使用可用的 mRNA 和蛋白质序列数据进行了基因预测。基因模型的数量为 16880 个,N50 为 2154 bp。新的基因模型比旧的基因模型反映了更准确的编码序列和基因集。使用新的基因组组装还重新估计了重复元件的比例,在家蚕基因组中计算为 46.8%。新的基因组组装和基因模型在 SilkBase(http://silkbase.ab.a.u-tokyo.ac.jp)中提供。