Suppr超能文献

ULTRA:一种基于模型的串联重复序列检测工具。

ULTRA: A Model Based Tool to Detect Tandem Repeats.

作者信息

Olson Daniel, Wheeler Travis

机构信息

University of Montana, Missoula, Montana

出版信息

ACM BCB. 2018 Aug-Sep;2018:37-46. doi: 10.1145/3233547.3233604.

Abstract

In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool () are competitive with the most heavily used tool for repeat masking (). is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.

摘要

在生物序列中,串联重复由数十到数百个重复模式的残基组成,例如atgatgatgatgatg(“atg”重复),这通常是复制滑移的结果。随着时间的推移,这些重复会逐渐衰减,以至于最初清晰的重复模式会有些模糊,但即使是退化的重复也会给序列注释带来问题:当两个序列都包含相似重复的共享模式时,结果可能是序列同源性的假信号。我们描述了一种用于检测串联重复的新隐马尔可夫模型的实现,该模型对标记衰减的重复区域具有显著提高的灵敏度,在广泛的序列组成范围内呈现出低且可靠的错误注释率,并产生遵循稳定分布的分数。在典型的基因组序列上,所得工具()的时间和内存要求与用于重复掩码的使用最频繁的工具()具有竞争力。该工具在开源许可下发布,并为将该模型纳入序列比对工具和注释管道奠定了基础。

相似文献

1
ULTRA: A Model Based Tool to Detect Tandem Repeats.
ACM BCB. 2018 Aug-Sep;2018:37-46. doi: 10.1145/3233547.3233604.
2
TRAL 2.0: Tandem Repeat Detection With Circular Profile Hidden Markov Models and Evolutionary Aligner.
Front Bioinform. 2021 Jun 25;1:691865. doi: 10.3389/fbinf.2021.691865. eCollection 2021.
3
WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences.
Bioinform Adv. 2024 Apr 8;4(1):vbae052. doi: 10.1093/bioadv/vbae052. eCollection 2024.
4
ULTRA-Effective Labeling of Repetitive Genomic Sequence.
bioRxiv. 2024 Jun 4:2024.06.03.597269. doi: 10.1101/2024.06.03.597269.
5
Probabilistic approaches to alignment with tandem repeats.
Algorithms Mol Biol. 2014 Mar 1;9(1):3. doi: 10.1186/1748-7188-9-3.
6
Statistical approaches to detecting and analyzing tandem repeats in genomic sequences.
Front Bioeng Biotechnol. 2015 Mar 17;3:31. doi: 10.3389/fbioe.2015.00031. eCollection 2015.
7
Tandem repeats over the edit distance.
Bioinformatics. 2007 Jan 15;23(2):e30-5. doi: 10.1093/bioinformatics/btl309.
8
A lossy compression technique enabling duplication-aware sequence alignment.
Evol Bioinform Online. 2012;8:171-80. doi: 10.4137/EBO.S9131. Epub 2012 Apr 2.
9
Look4TRs: a de novo tool for detecting simple tandem repeats using self-supervised hidden Markov models.
Bioinformatics. 2020 Jan 15;36(2):380-387. doi: 10.1093/bioinformatics/btz551.
10
The challenge of small-scale repeats for indel discovery.
Front Bioeng Biotechnol. 2015 Jan 26;3:8. doi: 10.3389/fbioe.2015.00008. eCollection 2015.

引用本文的文献

1
Chromosomal Inversions Mediated by Tandem Insertions of Transposable Elements.
Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf131.
2
A chromosomal reference genome sequence for the malaria mosquito, , Giles, 1902.
Wellcome Open Res. 2024 Sep 26;9:553. doi: 10.12688/wellcomeopenres.22988.1. eCollection 2024.
3
Short tandem repeats delineate gene bodies across eukaryotes.
Nat Commun. 2024 Dec 30;15(1):10902. doi: 10.1038/s41467-024-55276-w.
4
SatXplor-a comprehensive pipeline for satellite DNA analyses in complex genome assemblies.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae660.
5
ULTRA-effective labeling of tandem repeats in genomic sequence.
Bioinform Adv. 2024 Oct 9;4(1):vbae149. doi: 10.1093/bioadv/vbae149. eCollection 2024.
6
A chromosomal reference genome sequence for the malaria mosquito, , Theobald, 1903.
Wellcome Open Res. 2024 Sep 26;9:554. doi: 10.12688/wellcomeopenres.22989.1. eCollection 2024.
7
Chromosomal reference genome sequences for the malaria mosquito, , Laveran, 1900.
Wellcome Open Res. 2024 Sep 26;9:551. doi: 10.12688/wellcomeopenres.22983.1. eCollection 2024.
9
ULTRA-Effective Labeling of Repetitive Genomic Sequence.
bioRxiv. 2024 Jun 4:2024.06.03.597269. doi: 10.1101/2024.06.03.597269.
10
The complete sequence and comparative analysis of ape sex chromosomes.
Nature. 2024 Jun;630(8016):401-411. doi: 10.1038/s41586-024-07473-2. Epub 2024 May 29.

本文引用的文献

1
Realistic artificial DNA sequences as negative controls for computational genomics.
Nucleic Acids Res. 2014 Jul;42(12):e99. doi: 10.1093/nar/gku356. Epub 2014 May 6.
2
Probabilistic approaches to alignment with tandem repeats.
Algorithms Mol Biol. 2014 Mar 1;9(1):3. doi: 10.1186/1748-7188-9-3.
3
Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions.
Nucleic Acids Res. 2013 Jul;41(12):e121. doi: 10.1093/nar/gkt263. Epub 2013 Apr 17.
4
Dfam: a database of repetitive DNA based on profile hidden Markov models.
Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. doi: 10.1093/nar/gks1265. Epub 2012 Nov 30.
5
Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences.
Genome Dyn. 2012;7:108-25. doi: 10.1159/000337121. Epub 2012 Jun 25.
6
Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance.
Brief Bioinform. 2013 Jan;14(1):67-81. doi: 10.1093/bib/bbs023. Epub 2012 May 29.
7
Assessing the role of tandem repeats in shaping the genomic architecture of great apes.
PLoS One. 2011;6(11):e27239. doi: 10.1371/journal.pone.0027239. Epub 2011 Nov 4.
8
A new repeat-masking method enables specific detection of homologous sequences.
Nucleic Acids Res. 2011 Mar;39(4):e23. doi: 10.1093/nar/gkq1212. Epub 2010 Nov 24.
9
Variable tandem repeats accelerate evolution of coding and regulatory sequences.
Annu Rev Genet. 2010;44:445-77. doi: 10.1146/annurev-genet-072610-155046.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验