Brown Max R, Manuel Gonzalez de La Rosa Pablo, Blaxter Mark
School of Life Sciences, Anglia Ruskin University, Cambridge, CB1 1PT, United Kingdom.
Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1RQ, United Kingdom.
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf049.
"tidk" (short for telomere identification toolkit) uses a simple, fast algorithm to scan long DNA reads for the presence of short tandemly repeated DNA in runs, and to aggregate them based on canonical DNA string representation. These are telomeric repeat candidates. Our algorithm is shown to be accurate in genomes for which the telomeric repeat unit is known and is tested across a wide variety of newly assembled genomes to uncover new telomeric repeat units. Tools are provided to identify telomeric repeats de novo, scan genomes for known telomeric repeats, and to visualize telomeric repeats on the assembly. "tidk" is implemented in Rust and is available as a command line tool which can be compiled using the Rust toolchain or downloaded as a binary from bioconda.
The "tidk" Rust crate is freely available under the MIT license (https://crates.io/crates/tidk), and the source code is available at https://github.com/tolkit/telomeric-identifier.
“tidk”(端粒识别工具包的缩写)使用一种简单、快速的算法来扫描长DNA读段,以查找连续短串联重复DNA的存在,并根据标准DNA字符串表示对它们进行汇总。这些是端粒重复候选序列。我们的算法在已知端粒重复单元的基因组中被证明是准确的,并在各种新组装的基因组上进行了测试,以发现新的端粒重复单元。提供了从头识别端粒重复序列、在基因组中扫描已知端粒重复序列以及在组装结果上可视化端粒重复序列的工具。“tidk”用Rust实现,作为命令行工具可用,可使用Rust工具链进行编译,或从bioconda作为二进制文件下载。
“tidk”Rust包在MIT许可下免费提供(https://crates.io/crates/tidk),源代码可在https://github.com/tolkit/telomeric-identifier获取。