School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
Genome Biol. 2023 Mar 28;24(1):58. doi: 10.1186/s13059-023-02900-5.
Significant improvements in long-read sequencing technologies have unlocked complex genomic areas, such as centromeres, in the genome and introduced the centromere annotation problem. Currently, centromeres are annotated in a semi-manual way. Here, we propose HiCAT, a generalizable automatic centromere annotation tool, based on hierarchical tandem repeat mining to facilitate decoding of centromere architecture. We apply HiCAT to simulated datasets, human CHM13-T2T and gapless Arabidopsis thaliana genomes. Our results are generally consistent with previous inferences but also greatly improve annotation continuity and reveal additional fine structures, demonstrating HiCAT's performance and general applicability.
长读测序技术的显著改进已经解锁了基因组中的复杂区域,如着丝粒,并引入了着丝粒注释问题。目前,着丝粒是通过半手动方式进行注释的。在这里,我们提出了 HiCAT,一种基于层次串联重复挖掘的可推广的自动着丝粒注释工具,以促进着丝粒结构的解码。我们将 HiCAT 应用于模拟数据集、人类 CHM13-T2T 和无间隙拟南芥基因组。我们的结果与以前的推断基本一致,但也大大提高了注释的连续性,并揭示了额外的精细结构,证明了 HiCAT 的性能和普遍适用性。