Suppr超能文献

TIDK:一种从基因组数据集中快速识别端粒重复序列的工具包。

tidk: a toolkit to rapidly identify telomeric repeats from genomic datasets.

作者信息

Brown Max R, Manuel Gonzalez de La Rosa Pablo, Blaxter Mark

机构信息

School of Life Sciences, Anglia Ruskin University, Cambridge, CB1 1PT, United Kingdom.

Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1RQ, United Kingdom.

出版信息

Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf049.

Abstract

SUMMARY

"tidk" (short for telomere identification toolkit) uses a simple, fast algorithm to scan long DNA reads for the presence of short tandemly repeated DNA in runs, and to aggregate them based on canonical DNA string representation. These are telomeric repeat candidates. Our algorithm is shown to be accurate in genomes for which the telomeric repeat unit is known and is tested across a wide variety of newly assembled genomes to uncover new telomeric repeat units. Tools are provided to identify telomeric repeats de novo, scan genomes for known telomeric repeats, and to visualize telomeric repeats on the assembly. "tidk" is implemented in Rust and is available as a command line tool which can be compiled using the Rust toolchain or downloaded as a binary from bioconda.

AVAILABILITY AND IMPLEMENTATION

The "tidk" Rust crate is freely available under the MIT license (https://crates.io/crates/tidk), and the source code is available at https://github.com/tolkit/telomeric-identifier.

摘要

摘要

“tidk”(端粒识别工具包的缩写)使用一种简单、快速的算法来扫描长DNA读段,以查找连续短串联重复DNA的存在,并根据标准DNA字符串表示对它们进行汇总。这些是端粒重复候选序列。我们的算法在已知端粒重复单元的基因组中被证明是准确的,并在各种新组装的基因组上进行了测试,以发现新的端粒重复单元。提供了从头识别端粒重复序列、在基因组中扫描已知端粒重复序列以及在组装结果上可视化端粒重复序列的工具。“tidk”用Rust实现,作为命令行工具可用,可使用Rust工具链进行编译,或从bioconda作为二进制文件下载。

可用性和实现方式

“tidk”Rust包在MIT许可下免费提供(https://crates.io/crates/tidk),源代码可在https://github.com/tolkit/telomeric-identifier获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验