Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm 171 76, Sweden.
Clinical Genomics Facility, Science for Life Laboratory, Stockholm 171 76, Sweden.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae686.
Repeat elements, such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV, and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing, due to its read length and TEs repetitive nature. Long-read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis not only from a research perspective but also for future clinical detection. When choosing an lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision, and compatibility with inclusion into pipelines are crucial for efficient detection.
We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast, and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48× as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a variant call format (VCF) file, enabling compatibility with other variant callers and downstream analysis.
sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive, and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.
重复元件,如转座元件(TE),是高度重复的 DNA 序列,约占基因组的 50%。Alu、SVA、HERV 和 L1 等 TE 可通过破坏基因、引起移码突变或改变剪接模式而导致疾病。由于其读长和 TE 的重复性质,使用短读长基因组测序对这些元件进行特征描述具有挑战性。长读长基因组测序(lrGS)能够桥接 TE,从而提高重复 DNA 序列的分辨率。因此,lrGS 不仅为研究提供了改进 TE 检测和分析的机会,而且为未来的临床检测提供了机会。在选择 lrGS TE 调用程序时,运行时、CPU 小时数、灵敏度、精度以及与纳入管道的兼容性等参数对于高效检测至关重要。
因此,我们开发了 sTELLeR(s)Transposable ELement in Long(e)Read,用于准确、快速和有效地检测 TE。特别是,sTELLeR 在调用 Alu 元件方面比类似工具具有更高的精度和灵敏度。该调用程序的速度快 5-48 倍,使用的 CPU 小时数<2%,与竞争调用程序相比。该调用程序具有单倍型意识,并以变体调用格式(VCF)文件输出结果,从而与其他变体调用程序和下游分析兼容。
sTELLeR 是一个基于 python 的工具,可在 https://github.com/kristinebilgrav/sTELLeR 上获得。总之,我们表明 sTELLeR 是一种快速、敏感和精确的 TE 元件检测调用程序,并且可以轻松地集成到变体调用工作流程中。