International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu 322000, China.
Center for Evolutionary & Organismal Biology, Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou 311121, China.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae633.
Recent advances in long-read sequencing technologies have significantly facilitated the production of high-quality genome assembly. The telomere-to-telomere (T2T) gapless assembly has become the new golden standard of genome assembly efforts. Several recent efforts have claimed to produce T2T-level reference genomes. However, a universal standard is still missing to qualify a genome assembly to be at T2T standard. Traditional genome assembly assessment metrics (N50 and its derivatives) have no capacity in differentiating between nearly T2T assembly and the truly T2T assembly in continuity either globally or locally. Additionally, these metrics are independent of raw reads, making them inflated easily by artificial operations. Therefore, a gaplessness evaluation tool at single-nucleotide resolution to reflect true completeness is urgently needed in the era of complete genomes.
Here, we present a tool called Genome Continuity Inspector (GCI), designed to assess genome assembly continuity at single-base resolution, and evaluate how close an assembly is to the T2T level. GCI utilizes multiple aligners to map long reads from various sequencing platforms back to the assembly. By incorporating curated mapping coverage of high-confidence read alignments, GCI identifies potential assembly issues. Meanwhile, it provides GCI scores that quantify overall assembly continuity on the whole genome or chromosome scales.
The open-source GCI code is freely available on Github (https://github.com/yeeus/GCI) under the MIT license.
近年来,长读测序技术的进步极大地促进了高质量基因组组装的产生。端粒到端粒(T2T)无间隙组装已成为基因组组装工作的新标准。最近有几项研究声称已经产生了 T2T 水平的参考基因组。然而,仍然缺乏一个通用标准来确定基因组组装是否达到 T2T 标准。传统的基因组组装评估指标(N50 及其衍生指标)在全球或局部范围内都没有能力区分几乎达到 T2T 组装和真正的 T2T 连续性组装。此外,这些指标与原始读数无关,因此很容易被人为操作夸大。因此,在完整基因组时代,迫切需要一种在单核苷酸分辨率下评估无间隙性的工具,以反映真实的完整性。
在这里,我们介绍了一种名为基因组连续性检查器(GCI)的工具,用于评估基因组组装在单碱基分辨率下的连续性,并评估组装与 T2T 水平的接近程度。GCI 利用多个比对器将来自各种测序平台的长读序列映射回组装。通过整合高可信度读对齐的精心策划的映射覆盖率,GCI 可以识别潜在的组装问题。同时,它还提供了 GCI 评分,用于量化整个基因组或染色体尺度上的组装连续性。
开源的 GCI 代码可在 Github(https://github.com/yeeus/GCI)上免费获取,遵循 MIT 许可证。