Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Biological Science Research Center, Southwest University, Chongqing 400715, China.
Department of Bioinformatics, The Basic Medical School of Chongqing Medical University, Chongqing 400016, China.
Int J Mol Sci. 2024 Nov 17;25(22):12341. doi: 10.3390/ijms252212341.
Genome sequences contain the fundamental genetic information that largely determines the biology of a species. Over the past 20 years, advancements in high-throughput sequencing technologies and bioinformatics tools have matured, facilitating genome assembly and ushering in the telomere-to-telomere (T2T) era. is renowned as a silk-producing insect and serves as an important model organism extensively studied across various fields of biology. In this study, we present the first assembled T2T genome by integrating HiFi, ultra-long ONT, NGS, and Hi-C data. This assembly comprises 450,267,439 base pairs from 28 chromosomes and includes annotations for a total of 18,253 protein-coding genes. A completeness evaluation revealed that 99.1% of conserved single-copy genes were included, as determined by a BUSCO analysis. Furthermore, the consensus quality (QV) assessed through Merqury was recorded at 59.88. The proportion of repeat sequence achieved 60.77%, marking it as the highest reported value for to date. In comparison to previously published genomes, our assembly offers a more complete and higher quality representation, particularly concerning highly homologous tandem regions such as telomeres, rDNA clusters, and Gr family regions. Furthermore, our extensive experience in genome assembly, including sample preparation experience and assembly strategies to reduce complexity, will provide valuable references for other species aiming to achieve their own T2T genome assemblies.
基因组序列包含了决定物种生物学特性的基本遗传信息。在过去的 20 年中,高通量测序技术和生物信息学工具的进步已经成熟,促进了基因组的组装,并迎来了端粒到端粒(T2T)的时代。 是一种产丝昆虫,作为一个重要的模式生物,在生物学的各个领域都得到了广泛的研究。在这项研究中,我们通过整合 HiFi、超长 ONT、NGS 和 Hi-C 数据,展示了第一个组装的 T2T 基因组。该组装由 28 条染色体的 450,267,439 个碱基对组成,包含了总共 18,253 个蛋白编码基因的注释。通过 BUSCO 分析,完整性评估表明,包含了 99.1%的保守单拷贝基因。此外,通过 Merqury 评估的一致性质量(QV)记录为 59.88。重复序列的比例达到了 60.77%,这是迄今为止报道的 中最高的比例。与以前发表的基因组相比,我们的组装提供了更完整和更高质量的基因组组装,特别是在高度同源的串联区域,如端粒、rDNA 簇和 Gr 家族区域。此外,我们在基因组组装方面的丰富经验,包括样本制备经验和降低复杂性的组装策略,将为其他希望实现自己的 T2T 基因组组装的物种提供有价值的参考。