• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重复还是不重复?——基因组序列中串联重复预测的统计验证。

Repeat or not repeat?--Statistical validation of tandem repeat prediction in genomic sequences.

机构信息

Computer Science Department, ETH Zürich, Universitätsstrasse 6, CH-8092 Zürich, Switzerland.

出版信息

Nucleic Acids Res. 2012 Nov 1;40(20):10005-17. doi: 10.1093/nar/gks726. Epub 2012 Aug 25.

DOI:10.1093/nar/gks726
PMID:22923522
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3488214/
Abstract

Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.

摘要

串联重复序列 (TRs) 是基因组序列中最常见的特征之一。由于其丰富性和功能意义,在过去的二十年中,已经设计出了大量的检测工具。尽管人们对此一直很感兴趣,但 TR 的检测仍然没有得到解决。我们的大规模测试表明,当前的检测器产生了不同的、通常不重叠的推断,这反映了底层算法的特征,而不是 TR 在基因组数据中的真实分布。我们的模拟表明,检测 TR 的能力取决于它们的分歧程度,以及重复的特征,如最小重复单元的长度及其串联的数量。为了协调当前算法的不同预测,我们提出并评估了几种用于测量预测重复单元质量的统计标准。特别是,我们提出了一种基于模型的系统发育分类器,涉及重复分歧的最大似然估计。与最先进的检测器结合使用,我们对推断重复的统计分类方案可以过滤掉假阳性预测。由于不同的算法似乎专门用于预测具有某些特性的 TR,因此我们建议使用多个检测器进行后续过滤,以获得最完整的真实重复集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/dce3b400f9a4/gks726f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/562c1917d305/gks726f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/fb5a3180f4cb/gks726f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/2a47bda75277/gks726f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/dce3b400f9a4/gks726f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/562c1917d305/gks726f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/fb5a3180f4cb/gks726f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/2a47bda75277/gks726f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/883a/3488214/dce3b400f9a4/gks726f4p.jpg

相似文献

1
Repeat or not repeat?--Statistical validation of tandem repeat prediction in genomic sequences.重复还是不重复?——基因组序列中串联重复预测的统计验证。
Nucleic Acids Res. 2012 Nov 1;40(20):10005-17. doi: 10.1093/nar/gks726. Epub 2012 Aug 25.
2
In search of the boundary between repetitive and non-repetitive protein sequences.寻找重复与非重复蛋白质序列之间的界限。
Biochem Soc Trans. 2015 Oct;43(5):807-11. doi: 10.1042/BST20150073.
3
TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.TRDistiller:一种用于富集含有串联重复序列蛋白质的序列数据集的快速筛选工具。
J Struct Biol. 2014 Jun;186(3):386-91. doi: 10.1016/j.jsb.2014.03.013. Epub 2014 Mar 26.
4
Genome-wide analysis of tandem repeats in Daphnia pulex--a comparative approach.对水蚤属内串联重复序列的全基因组分析——一种比较方法。
BMC Genomics. 2010 Apr 30;11:277. doi: 10.1186/1471-2164-11-277.
5
Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences.基因组技术和临床意识的进步加速了与疾病相关串联重复序列的发现。
Genome Res. 2022 Jan;32(1):1-27. doi: 10.1101/gr.269530.120. Epub 2021 Dec 29.
6
REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences.REP2:一个用于检测蛋白质序列中常见串联重复的网络服务器。
J Mol Biol. 2021 May 28;433(11):166895. doi: 10.1016/j.jmb.2021.166895. Epub 2021 Feb 24.
7
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.TRStalker:一种用于发现模糊串联重复的高效启发式算法。
Bioinformatics. 2010 Jun 15;26(12):i358-66. doi: 10.1093/bioinformatics/btq209.
8
Statistical approaches to detecting and analyzing tandem repeats in genomic sequences.统计方法在基因组序列中串联重复的检测和分析。
Front Bioeng Biotechnol. 2015 Mar 17;3:31. doi: 10.3389/fbioe.2015.00031. eCollection 2015.
9
Deep conservation of human protein tandem repeats within the eukaryotes.真核生物中人类蛋白质串联重复的深度保守性。
Mol Biol Evol. 2014 May;31(5):1132-48. doi: 10.1093/molbev/msu062. Epub 2014 Feb 3.
10
Graph-based modeling of tandem repeats improves global multiple sequence alignment.基于图的串联重复建模可改善全局多重序列比对。
Nucleic Acids Res. 2013 Sep;41(17):e162. doi: 10.1093/nar/gkt628. Epub 2013 Jul 22.

引用本文的文献

1
Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences.串联重复序列的准确检测揭示了生物序列的普遍重用。
Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf866.
2
Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures.串联重复为趋同进化至相似蛋白质结构提供了证据。
Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf013.
3
Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots.

本文引用的文献

1
Next-generation sequencing technologies and fragment assembly algorithms.下一代测序技术与片段组装算法。
Methods Mol Biol. 2012;855:155-74. doi: 10.1007/978-1-61779-582-4_5.
2
ALF--a simulation framework for genome evolution.ALF--一个用于基因组进化的模拟框架。
Mol Biol Evol. 2012 Apr;29(4):1115-23. doi: 10.1093/molbev/msr268. Epub 2011 Dec 8.
3
NTRFinder: a software tool to find nested tandem repeats.NTRFinder:一种用于查找嵌套串联重复的软件工具。
使用自同源点图发现和分析蛋白质中的重复和低复杂度结构及其保守的进化关系。
Methods Mol Biol. 2025;2870:95-116. doi: 10.1007/978-1-0716-4213-9_7.
4
A novel hypervariable variable number tandem repeat in the dopamine transporter gene ().多巴胺转运体基因中的一个新的高变区可变数串联重复序列()。
Life Sci Alliance. 2023 Feb 8;6(4). doi: 10.26508/lsa.202201677. Print 2023 Apr.
5
TRAL 2.0: Tandem Repeat Detection With Circular Profile Hidden Markov Models and Evolutionary Aligner.TRAL 2.0:使用圆形轮廓隐马尔可夫模型和进化比对器进行串联重复序列检测
Front Bioinform. 2021 Jun 25;1:691865. doi: 10.3389/fbinf.2021.691865. eCollection 2021.
6
Beyond Microsatellite Instability: Intrinsic Disorder as a Potential Link Between Protein Short Tandem Repeats and Cancer.超越微卫星不稳定性:内在无序作为蛋白质短串联重复序列与癌症之间的潜在联系
Front Bioinform. 2021 Jun 8;1:685844. doi: 10.3389/fbinf.2021.685844. eCollection 2021.
7
The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins.α-螺线管重复蛋白的TPR和锚蛋白家族突变的影响
Front Bioinform. 2021 Jul 6;1:696368. doi: 10.3389/fbinf.2021.696368. eCollection 2021.
8
Whole Genome Sequencing Analysis of Effects of CRISPR/Cas9 in : A Budding Yeast in Distress.CRISPR/Cas9对处于困境的芽殖酵母影响的全基因组测序分析
J Fungi (Basel). 2022 Sep 21;8(10):992. doi: 10.3390/jof8100992.
9
Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species.突变和选择过程调节短串联重复序列,在物种之间产生遗传和表型多样性。
J Evol Biol. 2023 Feb;36(2):321-336. doi: 10.1111/jeb.14106. Epub 2022 Oct 26.
10
The Repeating, Modular Architecture of the HtrA Proteases.HtrA 蛋白酶的重复、模块化结构。
Biomolecules. 2022 Jun 7;12(6):793. doi: 10.3390/biom12060793.
Nucleic Acids Res. 2012 Feb;40(3):e17. doi: 10.1093/nar/gkr1070. Epub 2011 Nov 25.
4
Tandem repeats in proteins: from sequence to structure.蛋白质中的串联重复:从序列到结构。
J Struct Biol. 2012 Sep;179(3):279-88. doi: 10.1016/j.jsb.2011.08.009. Epub 2011 Aug 24.
5
Ensembl 2011.Ensembl 2011年版
Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.
6
TRedD--a database for tandem repeats over the edit distance.TRedD--一个针对编辑距离上串联重复的数据库。
Database (Oxford). 2010 Jul 6;2010:baq003. doi: 10.1093/database/baq003.
7
T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm.T-REKS:基于 K-均值算法的序列中串联重复序列的识别。
Bioinformatics. 2009 Oct 15;25(20):2632-8. doi: 10.1093/bioinformatics/btp482. Epub 2009 Aug 11.
8
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.
9
Genesis, effects and fates of repeats in prokaryotic genomes.原核生物基因组中重复序列的起源、影响及归宿
FEMS Microbiol Rev. 2009 May;33(3):539-71. doi: 10.1111/j.1574-6976.2009.00169.x.
10
Detecting short tandem repeats from genome data: opening the software black box.从基因组数据中检测短串联重复序列:打开软件黑箱。
Brief Bioinform. 2008 Sep;9(5):355-66. doi: 10.1093/bib/bbn028. Epub 2008 Jul 10.