• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RSAT矩阵聚类:转录因子结合基序集合的动态探索与冗余减少

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

作者信息

Castro-Mondragon Jaime Abraham, Jaeger Sébastien, Thieffry Denis, Thomas-Chollier Morgane, van Helden Jacques

机构信息

Aix Marseille Univ, INSERM, TAGC, Theory and Approaches of Genomic Complexity, UMR_S 1090, Marseille, France.

Aix Marseille Univ, CNRS, INSERM, CIML, Marseille, France.

出版信息

Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.

DOI:10.1093/nar/gkx314
PMID:28591841
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5737723/
Abstract

Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.

摘要

转录因子(TF)数据库包含来自各种来源的大量结合基序(TFBM),通过人工筛选从中获得非冗余集合。高通量方法的出现促使产生了包含越来越多基序的新集合。通过合并这些集合构建的元数据库包含冗余版本,因为现有的工具不适合自动识别和探索数千个基序中生物学相关的聚类。从基因组规模数据集(例如ChIP-seq)中发现基序也会产生冗余基序,这妨碍了对结果的解释。我们提出了矩阵聚类方法,这是一种通用工具,可将相似的TFBM聚类为多个树,并自动创建非冗余的TFBM集合。矩阵聚类的独特之处在于其对对齐的TFBM的动态可视化,以及同时处理来自各种来源的多个集合的能力。我们证明,矩阵聚类极大地简化了对来自多个基序发现工具的组合结果的解释,并突出了相似基序的生物学相关变异。我们还进行了大规模应用,对来自24个完整数据库的约11000个基序进行聚类,结果表明矩阵聚类正确地将属于同一TF家族的基序分组,并大幅减少了基序冗余。矩阵聚类集成在RSAT套件(http://rsat.eu/)中,可通过用户友好的网页界面或命令行访问,以便将其集成到工作流程中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/8afa821a6c02/gkx314fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/afd277d40dc8/gkx314fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/4a04077d6c75/gkx314fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/79ea3cebfea9/gkx314fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/4e704d23ef76/gkx314fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/8f81621cb537/gkx314fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/8afa821a6c02/gkx314fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/afd277d40dc8/gkx314fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/4a04077d6c75/gkx314fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/79ea3cebfea9/gkx314fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/4e704d23ef76/gkx314fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/8f81621cb537/gkx314fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8f5/5737723/8afa821a6c02/gkx314fig6.jpg

相似文献

1
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.RSAT矩阵聚类:转录因子结合基序集合的动态探索与冗余减少
Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.
2
RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes.RSAT::植物:植物基因组ChIP-Seq峰中的基序发现
Methods Mol Biol. 2016;1482:297-322. doi: 10.1007/978-1-4939-6396-6_19.
3
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.abc4pwm:基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。
BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.
4
A novel Bayesian DNA motif comparison method for clustering and retrieval.一种用于聚类和检索的新型贝叶斯DNA基序比较方法。
PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.
5
RSAT 2015: Regulatory Sequence Analysis Tools.RSAT 2015:调控序列分析工具
Nucleic Acids Res. 2015 Jul 1;43(W1):W50-6. doi: 10.1093/nar/gkv362. Epub 2015 Apr 22.
6
RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.RSAT 峰基序:全尺寸 ChIP-seq 数据集的基序分析。
Nucleic Acids Res. 2012 Feb;40(4):e31. doi: 10.1093/nar/gkr1104. Epub 2011 Dec 8.
7
MATLIGN: a motif clustering, comparison and matching tool.MATLIGN:一种基序聚类、比较和匹配工具。
BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.
8
RSAT 2011: regulatory sequence analysis tools.RSAT 2011:调控序列分析工具。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W86-91. doi: 10.1093/nar/gkr377.
9
SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps.SeqGL在全基因组调控元件图谱中识别上下文相关的结合信号。
PLoS Comput Biol. 2015 May 27;11(5):e1004271. doi: 10.1371/journal.pcbi.1004271. eCollection 2015 May.
10
RSAT 2022: regulatory sequence analysis tools.RSAT 2022:调控序列分析工具。
Nucleic Acids Res. 2022 Jul 5;50(W1):W670-W676. doi: 10.1093/nar/gkac312.

引用本文的文献

1
A transcription factor ensemble orchestrates bundle sheath expression in rice.一个转录因子组合调控水稻叶肉细胞的表达。
Nat Commun. 2025 Jul 31;16(1):7040. doi: 10.1038/s41467-025-62087-0.
2
NKX2-5 congenital heart disease mutations show diverse loss and gain of epigenomic, biochemical and chromatin search functions underpinning pathogenicity.NKX2 - 5先天性心脏病突变在表观基因组、生化和染色质搜索功能方面呈现出多样的功能丧失和获得,这些功能是致病性的基础。
bioRxiv. 2025 Jun 20:2025.06.20.659510. doi: 10.1101/2025.06.20.659510.
3
Iterative deep learning design of human enhancers exploits condensed sequence grammar to achieve cell-type specificity.

本文引用的文献

1
Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape.顺式作用元件组和表观顺式作用元件特征塑造调控DNA景观。
Cell. 2016 May 19;165(5):1280-1292. doi: 10.1016/j.cell.2016.04.038.
2
A roadmap of constitutive NF-κB activity in Hodgkin lymphoma: Dominant roles of p50 and p52 revealed by genome-wide analyses.霍奇金淋巴瘤中组成型核因子-κB活性的路线图:全基因组分析揭示p50和p52的主导作用
Genome Med. 2016 Mar 17;8(1):28. doi: 10.1186/s13073-016-0280-5.
3
HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.
人类增强子的迭代深度学习设计利用压缩序列语法实现细胞类型特异性。
Cell Syst. 2025 Jun 4:101302. doi: 10.1016/j.cels.2025.101302.
4
Transcriptome analysis reveals a de novo DNA element that may interact with chromatin-associated proteins in Plasmodium berghei during erythrocytic development.转录组分析揭示了一种新的DNA元件,其可能在伯氏疟原虫红细胞发育过程中与染色质相关蛋白相互作用。
Sci Rep. 2025 May 28;15(1):18621. doi: 10.1038/s41598-025-03586-4.
5
A transcription factor module mediating C photosynthesis in the Brassicaceae.一个介导十字花科C4光合作用的转录因子模块。
EMBO Rep. 2025 May 1. doi: 10.1038/s44319-025-00461-1.
6
Multiplexed CRISPRi Reveals a Transcriptional Switch Between KLF Activators and Repressors in the Maturing Neocortex.多重CRISPR干扰揭示了成熟新皮质中KLF激活因子和抑制因子之间的转录开关。
bioRxiv. 2025 Feb 15:2025.02.07.636951. doi: 10.1101/2025.02.07.636951.
7
The regulatory landscape of 5' UTRs in translational control during zebrafish embryogenesis.斑马鱼胚胎发育过程中5'非翻译区在翻译调控中的调控格局。
Dev Cell. 2025 May 19;60(10):1498-1515.e8. doi: 10.1016/j.devcel.2024.12.038. Epub 2025 Jan 15.
8
Transcriptional regulation of the piRNA pathway by Ovo in animal ovarian germ cells.Ovo对动物卵巢生殖细胞中piRNA途径的转录调控。
Genes Dev. 2025 Feb 3;39(3-4):221-241. doi: 10.1101/gad.352120.124.
9
Regulation of meristem and hormone function revealed through analysis of directly-regulated SHOOT MERISTEMLESS target genes.通过对直接调控的无茎尖分生组织靶基因的分析揭示分生组织和激素功能的调控
Sci Rep. 2025 Jan 2;15(1):240. doi: 10.1038/s41598-024-83985-1.
10
Identifying transcription factors with cell-type specific DNA binding signatures.鉴定具有细胞类型特异性 DNA 结合特征的转录因子。
BMC Genomics. 2024 Oct 14;25(1):957. doi: 10.1186/s12864-024-10859-1.
HOCOMOCO:转录因子结合位点模型集合的扩展与增强
Nucleic Acids Res. 2016 Jan 4;44(D1):D116-25. doi: 10.1093/nar/gkv1249. Epub 2015 Nov 19.
4
DNA-dependent formation of transcription factor pairs alters their binding specificity.DNA 依赖性转录因子对的形成改变了它们的结合特异性。
Nature. 2015 Nov 19;527(7578):384-8. doi: 10.1038/nature15518. Epub 2015 Nov 9.
5
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.JASPAR 2016:转录因子结合谱开放获取数据库的重大扩展与更新
Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.
6
Selective influence of Sox2 on POU transcription factor binding in embryonic and neural stem cells.Sox2对胚胎干细胞和神经干细胞中POU转录因子结合的选择性影响。
EMBO Rep. 2015 Sep;16(9):1177-91. doi: 10.15252/embr.201540467. Epub 2015 Aug 11.
7
Varying levels of complexity in transcription factor binding motifs.转录因子结合基序的复杂程度不同。
Nucleic Acids Res. 2015 Oct 15;43(18):e119. doi: 10.1093/nar/gkv577. Epub 2015 Jun 26.
8
RSAT 2015: Regulatory Sequence Analysis Tools.RSAT 2015:调控序列分析工具
Nucleic Acids Res. 2015 Jul 1;43(W1):W50-6. doi: 10.1093/nar/gkv362. Epub 2015 Apr 22.
9
C2H2 zinc finger proteins greatly expand the human regulatory lexicon.C2H2 锌指蛋白极大地扩展了人类调控词汇。
Nat Biotechnol. 2015 May;33(5):555-62. doi: 10.1038/nbt.3128. Epub 2015 Feb 18.
10
Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach.使用遗传k-中心点方法对转录因子结合基序进行无比对聚类。
BMC Bioinformatics. 2015 Jan 28;16:22. doi: 10.1186/s12859-015-0450-2.