• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TRStalker:一种用于发现模糊串联重复的高效启发式算法。

TRStalker: an efficient heuristic for finding fuzzy tandem repeats.

机构信息

CNR, Istituto di Informatica e Telematica, Via Moruzzi 1, 56124 Pisa, Italy.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i358-66. doi: 10.1093/bioinformatics/btq209.

DOI:10.1093/bioinformatics/btq209
PMID:20529928
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881393/
Abstract

MOTIVATION

Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events.

RESULTS

We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the 'generalized median string' that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences.

AVAILABILITY

TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高等真核生物的基因组包含大量重复序列。串联重复(TRs)构成了一类大量的重复序列,它们通过复制滑动等现象起源,其特征是紧密的空间连续性。它们在几种分子调控机制中发挥着重要作用,也在几种疾病(例如三核苷酸重复障碍组)中发挥着重要作用。虽然对于低或中等分化水平的 TRs,当前的方法相当有效,但检测分化程度较高的 TRs(模糊 TRs)的问题仍然存在。检测模糊 TRs 有助于丰富我们对其在调控机制和疾病中的作用的认识。模糊 TRs 也是重要的工具,可以揭示它们在基因组进化历史中的作用,其中更高的分化程度与更遥远的重复事件相关。

结果

我们开发了一种算法(命名为 TRStalker),旨在有效地检测由于其内在的模糊性而难以检测的 TRs,这种模糊性是由于高水平的碱基替换、插入和缺失造成的。为了实现这一目标,我们开发了启发式算法来解决一个 Steiner 版本的问题,其中模糊性是相对于不一定存在于输入字符串中的 motif 字符串来测量的。这个问题类似于“广义中值字符串”,已知其是一个 NP 难问题。使用合成和生物序列的实验表明,我们的方法在模糊 TRs 方面优于当前的最新技术,并且我们检测到的模糊 TRs 确实存在于重要的生物序列中。

可用性

TRStalker 将集成到基于网络的 TRs Discovery Service(TReaDS)中,网址为 bioalgo.iit.cnr.it。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/820c/2881393/08dfbe578edd/btq209f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/820c/2881393/d4c94a7f8436/btq209f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/820c/2881393/08dfbe578edd/btq209f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/820c/2881393/d4c94a7f8436/btq209f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/820c/2881393/08dfbe578edd/btq209f2.jpg

相似文献

1
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.TRStalker:一种用于发现模糊串联重复的高效启发式算法。
Bioinformatics. 2010 Jun 15;26(12):i358-66. doi: 10.1093/bioinformatics/btq209.
2
Detection of Highly Divergent Tandem Repeats in the Rice Genome.检测水稻基因组中的高度变异串联重复序列。
Genes (Basel). 2021 Mar 25;12(4):473. doi: 10.3390/genes12040473.
3
Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases.串联重复发现服务(TReaDS)应用于寻找重复扩展疾病中的新型顺式作用因子。
BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-13-S4-S3.
4
Decomposing mosaic tandem repeats accurately from long reads.从长读中准确分解镶嵌串联重复序列。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad185.
5
Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression.基因组序列中的短模糊串联重复序列、鉴定及其在基因表达调控中的可能作用。
Bioinformatics. 2006 Mar 15;22(6):676-84. doi: 10.1093/bioinformatics/btk032. Epub 2006 Jan 10.
6
Dot2dot: accurate whole-genome tandem repeats discovery.Dot2dot:准确的全基因组串联重复发现。
Bioinformatics. 2019 Mar 15;35(6):914-922. doi: 10.1093/bioinformatics/bty747.
7
A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences.一种将序列基序搜索与关键词相结合以查找包含DNA序列重复片段的软件程序。
Bioinformatics. 2004 Dec 12;20(18):3379-86. doi: 10.1093/bioinformatics/bth410. Epub 2004 Jul 15.
8
Tally: a scoring tool for boundary determination between repetitive and non-repetitive protein sequences.Tally:一种用于确定重复和非重复蛋白质序列之间界限的评分工具。
Bioinformatics. 2016 Jul 1;32(13):1952-8. doi: 10.1093/bioinformatics/btw118. Epub 2016 Mar 7.
9
Exact tandem repeats analyzer (E-TRA): a new program for DNA sequence mining.精确串联重复序列分析器(E-TRA):一种用于DNA序列挖掘的新程序。
J Genet. 2005 Apr;84(1):49-54. doi: 10.1007/BF02715889.
10
STRING: finding tandem repeats in DNA sequences.STRING:在DNA序列中查找串联重复序列。
Bioinformatics. 2003 Sep 22;19(14):1733-8. doi: 10.1093/bioinformatics/btg268.

引用本文的文献

1
RepeatOBserver: Tandem Repeat Visualisation and Putative Centromere Detection.重复序列观察器:串联重复序列可视化与假定着丝粒检测
Mol Ecol Resour. 2025 Mar 4:e14084. doi: 10.1111/1755-0998.14084.
2
Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning.简化用于作物扫描的简单序列重复数据挖掘方法和流程
Plants (Basel). 2024 Sep 19;13(18):2619. doi: 10.3390/plants13182619.
3
Bioinformatics tools for the sequence complexity estimates.用于序列复杂性估计的生物信息学工具。

本文引用的文献

1
Rare pathogenic microdeletions and tandem duplications are microhomology-mediated and stimulated by local genomic architecture.罕见的致病性微缺失和串联重复是由局部基因组结构介导的微同源性并受其刺激产生的。
Hum Mol Genet. 2009 Oct 1;18(19):3579-93. doi: 10.1093/hmg/ddp306. Epub 2009 Jul 3.
2
Lossless filter for multiple repeats with bounded edit distance.具有有界编辑距离的多重复无损滤波器。
Algorithms Mol Biol. 2009 Jan 30;4:3. doi: 10.1186/1748-7188-4-3.
3
Analysis of the largest tandemly repeated DNA families in the human genome.人类基因组中最大串联重复DNA家族的分析。
Biophys Rev. 2023 Sep 15;15(5):1367-1378. doi: 10.1007/s12551-023-01140-y. eCollection 2023 Oct.
4
Advances in the discovery and analyses of human tandem repeats.人类串联重复序列的发现和分析进展。
Emerg Top Life Sci. 2023 Dec 14;7(3):361-381. doi: 10.1042/ETLS20230074.
5
Detection of tandem repeats in the Capsicum annuum genome.辣椒基因组中串联重复序列的检测
DNA Res. 2023 Apr 25;30(3). doi: 10.1093/dnares/dsad007.
6
Use of 6 Nucleotide Length Words to Study the Complexity of Gene Sequences from Different Organisms.使用6个核苷酸长度的单词来研究不同生物体基因序列的复杂性。
Entropy (Basel). 2022 Apr 30;24(5):632. doi: 10.3390/e24050632.
7
Detection of Highly Divergent Tandem Repeats in the Rice Genome.检测水稻基因组中的高度变异串联重复序列。
Genes (Basel). 2021 Mar 25;12(4):473. doi: 10.3390/genes12040473.
8
Finding long tandem repeats in long noisy reads.在长噪声读取中查找长串联重复。
Bioinformatics. 2021 May 5;37(5):612-621. doi: 10.1093/bioinformatics/btaa865.
9
Self-analysis of repeat proteins reveals evolutionarily conserved patterns.重复蛋白质的自我分析揭示了进化上保守的模式。
BMC Bioinformatics. 2020 May 7;21(1):179. doi: 10.1186/s12859-020-3493-y.
10
TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain.TideHunter:使用种子和链在嘈杂的长读取中高效且敏感的串联重复检测。
Bioinformatics. 2019 Jul 15;35(14):i200-i207. doi: 10.1093/bioinformatics/btz376.
BMC Genomics. 2008 Nov 7;9:533. doi: 10.1186/1471-2164-9-533.
4
Comparative analyses of human single- and multilocus tandem repeats.人类单基因座和多基因座串联重复序列的比较分析。
Genetics. 2008 Jul;179(3):1693-704. doi: 10.1534/genetics.108.087882. Epub 2008 Jun 18.
5
Empirical comparison of ab initio repeat finding programs.从头开始重复序列查找程序的实证比较。
Nucleic Acids Res. 2008 Apr;36(7):2284-94. doi: 10.1093/nar/gkn064. Epub 2008 Feb 20.
6
DNA triplexes and Friedreich ataxia.DNA三链体与弗里德赖希共济失调
FASEB J. 2008 Jun;22(6):1625-34. doi: 10.1096/fj.07-097857. Epub 2008 Jan 22.
7
The genome-wide determinants of human and chimpanzee microsatellite evolution.人类和黑猩猩微卫星进化的全基因组决定因素。
Genome Res. 2008 Jan;18(1):30-8. doi: 10.1101/gr.7113408. Epub 2007 Nov 21.
8
Sequence-based estimation of minisatellite and microsatellite repeat variability.基于序列的小卫星和微卫星重复序列变异性估计。
Genome Res. 2007 Dec;17(12):1787-96. doi: 10.1101/gr.6554007. Epub 2007 Oct 31.
9
A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences.一种用于识别DNA序列中精确和不精确串联重复模式的新型信号处理方法。
EURASIP J Bioinform Syst Biol. 2007;2007(1):43596. doi: 10.1155/2007/43596.
10
The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats.CRISPRdb数据库以及用于显示CRISPRs并生成间隔序列和重复序列字典的工具。
BMC Bioinformatics. 2007 May 23;8:172. doi: 10.1186/1471-2105-8-172.