重复：使用不精确种子进行从头散布重复检测。

REPrise: de novo interspersed repeat detection using inexact seeding.

作者信息

Takeda Atsushi, Nonaka Daisuke, Imazu Yuta, Fukunaga Tsukasa, Hamada Michiaki

机构信息

Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, 1698555, Japan.

Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, 1698555, Japan.

出版信息

Mob DNA. 2025 Apr 3;16(1):16. doi: 10.1186/s13100-025-00353-0.

DOI:10.1186/s13100-025-00353-0

PMID:40181468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11966803/

Abstract

BACKGROUND

Interspersed repeats occupy a large part of many eukaryotic genomes, and thus their accurate annotation is essential for various genome analyses. Database-free de novo repeat detection approaches are powerful for annotating genomes that lack well-curated repeat databases. However, existing tools do not yet have sufficient repeat detection performance.

RESULTS

In this study, we developed REPrise, a de novo interspersed repeat detection software program based on a seed-and-extension method. Although the algorithm of REPrise is similar to that of RepeatScout, which is currently the de facto standard tool, we incorporated three unique techniques into REPrise: inexact seeding, affine gap scoring and loose masking. Analyses of rice and simulation genome datasets showed that REPrise outperformed RepeatScout in terms of sensitivity, especially when the repeat sequences contained many mutations. Furthermore, when applied to the complete human genome dataset T2T-CHM13, REPrise demonstrated the potential to detect novel repeat sequence families.

CONCLUSION

REPrise can detect interspersed repeats with high sensitivity even in long genomes. Our software enhances repeat annotation in diverse genomic studies, contributing to a deeper understanding of genomic structures.

摘要

背景

散布重复序列占据了许多真核生物基因组的很大一部分，因此它们的准确注释对于各种基因组分析至关重要。无数据库的从头重复序列检测方法对于注释缺乏精心整理的重复序列数据库的基因组很有效。然而，现有工具的重复序列检测性能仍不够充分。

结果

在本研究中，我们开发了REPrise，这是一种基于种子扩展法的从头散布重复序列检测软件程序。虽然REPrise的算法与目前事实上的标准工具RepeatScout的算法相似，但我们在REPrise中融入了三种独特技术：不精确种子设定、仿射空位计分和宽松掩码。对水稻和模拟基因组数据集的分析表明，REPrise在灵敏度方面优于RepeatScout，尤其是当重复序列包含许多突变时。此外，当应用于完整的人类基因组数据集T2T-CHM13时，REPrise展示了检测新重复序列家族的潜力。

结论

REPrise即使在长基因组中也能以高灵敏度检测散布重复序列。我们的软件增强了各种基因组研究中的重复序列注释，有助于更深入地理解基因组结构。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df0c/11966803/fd97faaa7300/13100_2025_353_Fig1_HTML.jpg

相似文献

REPrise: de novo interspersed repeat detection using inexact seeding.

Mob DNA. 2025 Apr 3;16(1):16. doi: 10.1186/s13100-025-00353-0.

Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale.

BMC Bioinformatics. 2015 Jul 24;16:227. doi: 10.1186/s12859-015-0654-5.

Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection.

Plant Physiol. 2019 Aug;180(4):1803-1815. doi: 10.1104/pp.19.00386. Epub 2019 May 31.

Genome-Wide Tool for Sensitive de novo Identification and Visualisation of Interspersed and Tandem Repeats.

Bioinform Biol Insights. 2024 Dec 18;18:11779322241306391. doi: 10.1177/11779322241306391. eCollection 2024.

Deep landscape update of dispersed and tandem repeats in the genome model of the red jungle fowl, Gallus gallus, using a series of de novo investigating tools.

BMC Genomics. 2016 Aug 19;17(1):659. doi: 10.1186/s12864-016-3015-5.

De novo identification of repeat families in large genomes.

Bioinformatics. 2005 Jun;21 Suppl 1:i351-8. doi: 10.1093/bioinformatics/bti1018.

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

PLoS One. 2016 Mar 15;11(3):e0150719. doi: 10.1371/journal.pone.0150719. eCollection 2016.

HomologMiner: looking for homologous genomic groups in whole genomes.

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.

BMC Genomics. 2013 Oct 11;14:700. doi: 10.1186/1471-2164-14-700.

A Complete and Accurate Ab Initio Repeat Finding Algorithm.

Interdiscip Sci. 2016 Mar;8(1):75-83. doi: 10.1007/s12539-015-0119-6. Epub 2015 Aug 14.

本文引用的文献

: A serialized data object for visualization of a phylogenetic tree and annotation data.

Imeta. 2022 Sep 28;1(4):e56. doi: 10.1002/imt2.56. eCollection 2022 Dec.

MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores.

BMC Genomics. 2022 Jun 6;23(1):423. doi: 10.1186/s12864-022-08619-0.

Software evaluation for de novo detection of transposons.

Mob DNA. 2022 Apr 27;13(1):14. doi: 10.1186/s13100-022-00266-2.

From telomere to telomere: The transcriptional and epigenetic state of human repeat elements.

Science. 2022 Apr;376(6588):eabk3112. doi: 10.1126/science.abk3112. Epub 2022 Apr 1.

The complete sequence of a human genome.

Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

Segmental duplications and their variation in a complete human genome.

Science. 2022 Apr;376(6588):eabj6965. doi: 10.1126/science.abj6965. Epub 2022 Apr 1.

Complete genomic and epigenetic maps of human centromeres.

Science. 2022 Apr;376(6588):eabl4178. doi: 10.1126/science.abl4178. Epub 2022 Apr 1.

A beginner's guide to manual curation of transposable elements.

Mob DNA. 2022 Mar 30;13(1):7. doi: 10.1186/s13100-021-00259-7.

The Earth BioGenome Project 2020: Starting the clock.

Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115635118.

Effective sequence similarity detection with strobemers.

Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重复：使用不精确种子进行从头散布重复检测。

REPrise: de novo interspersed repeat detection using inexact seeding.

作者信息

Takeda Atsushi, Nonaka Daisuke, Imazu Yuta, Fukunaga Tsukasa, Hamada Michiaki

机构信息

Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, 1698555, Japan.

Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, 1698555, Japan.