从全面的全对全转录因子结合基序基准测试研究中获得的见解。

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

机构信息

School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland.

Swiss Institute of Bioinformatics (SIB), CH-1015, Lausanne, Switzerland.

出版信息

Genome Biol. 2020 May 11;21(1):114. doi: 10.1186/s13059-020-01996-3.

DOI:10.1186/s13059-020-01996-3

PMID:32393327

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7212583/

Abstract

BACKGROUND

Positional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.

RESULTS

Here we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.

CONCLUSIONS

In an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.

摘要

背景

位置权重矩阵（PWM）是描述转录因子（TF）DNA 结合特异性的事实上的标准模型。从体内或体外数据推断出的 PWMs 存储在许多数据库中，并在众多生物学应用中使用。这就需要用大型实验参考集对公共 PWM 模型进行全面基准测试。

结果

我们在此报告了对人类 TF 的 DNA 结合位点的 PWM 模型进行全面比较的结果，这些模型是基于大量的体外（HT-SELEX、PBM）和体内（ChIP-seq）结合数据编译而成的。我们观察到，对于给定的 TF，表现最好的 PWM 通常属于另一个 TF，通常来自同一家族。偶尔，DNA 结合域的结构类别与结合特异性相关，这表明跨家族的性能度量值较好。基于基准测试的家族代表性基序选择比基于基序聚类的方法更有效。总体而言，体外和体内性能度量值之间有很好的一致性。然而，对于一些体内实验，表现最好的 PWM 被分配给一个不相关的 TF，这表明存在涉及蛋白质-蛋白质协同作用的结合模式。

结论

在全面比较的情况下，我们计算了不同 PWM-实验组合的超过 1800 万个性能度量值，并将这些结果作为公共资源提供给研究社区。基准测试协议通过 Web 界面和 Docker 映像提供。本研究的方法和结果可以帮助其他人更好地利用公共 TF 特异性模型和公共 TF 结合数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/171e/7212583/628ed6ee91e1/13059_2020_1996_Fig1_HTML.jpg

相似文献

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.从全面的全对全转录因子结合基序基准测试研究中获得的见解。

Genome Biol. 2020 May 11;21(1):114. doi: 10.1186/s13059-020-01996-3.

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.abc4pwm：基于亲和度的位置权重矩阵聚类在 DNA 序列分析中的应用。

BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.

BEESEM: estimation of binding energy models using HT-SELEX data.BEESEM：利用高通量筛选-指数富集配体系统进化技术（HT-SELEX）数据估计结合能模型

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.转录因子结合 k- -mer 分析阐明了人类结合特异性和顺式调控 SNP 的细胞类型依赖性。

BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.

A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data.基于 PBM、HT-SELEX 和 ChIP 数据的转录因子结合模型的比较分析。

Nucleic Acids Res. 2014 Apr;42(8):e63. doi: 10.1093/nar/gku117. Epub 2014 Feb 5.

PscanChIP: Finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments.PscanChIP：在 ChIP-Seq 实验的序列中发现过表达的转录因子结合位点基序及其相关性。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W535-43. doi: 10.1093/nar/gkt448. Epub 2013 Jun 7.

Comparative analysis of models in predicting the effects of SNPs on TF-DNA binding using large-scale in vitro and in vivo data.利用大规模的体外和体内数据对 SNP 对 TF-DNA 结合影响的预测模型进行比较分析。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae110.

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.基于染色质免疫沉淀测序（ChIP-seq）数据优化选择PWM基序数据库和序列扫描方法。

BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.

High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.转录因子-DNA 亲和力的高分辨率模型可改善体外和体内结合预测。

PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916.

Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.经实验验证的转录因子结合位点模型在ChIP-Seq数据计算分析中的应用。

BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80.

引用本文的文献

Variations in flanking or less conserved positions of Reb1 and Abf1 consensus binding sites lead to major changes in their ability to modulate nucleosome sliding activity.Reb1和Abf1共有结合位点侧翼或保守性较低位置的变异，会导致它们调节核小体滑动活性能力的重大变化。

Biol Res. 2025 Jul 29;58(1):53. doi: 10.1186/s40659-025-00627-0.

Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.基准测试转录因子结合位点预测模型：对合成数据和生物数据的比较分析

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.

TFEBexplorer: An integrated tool to study genes regulated by the stress-responsive Transcription Factor EB.TFEBexplorer：一种用于研究受应激反应转录因子EB调控基因的综合工具。

Autophagy Rep. 2022 Jul 21;1(1):295-305. doi: 10.1080/27694127.2022.2097822. eCollection 2022.

Asymmetry of Motif Conservation Within Their Homotypic Pairs Distinguishes DNA-Binding Domains of Target Transcription Factors in ChIP-Seq Data.染色质免疫沉淀测序（ChIP-Seq）数据中，基序在其同型配对内的保守性不对称可区分靶转录因子的DNA结合结构域。

Int J Mol Sci. 2025 Jan 4;26(1):386. doi: 10.3390/ijms26010386.

Pax proteins mediate segment-specific functions in proximal tubule survival and response to ischemic injury.Pax蛋白在近端肾小管存活及对缺血性损伤的反应中介导节段特异性功能。

Am J Physiol Renal Physiol. 2025 Jan 1;328(1):F95-F106. doi: 10.1152/ajprenal.00289.2024. Epub 2024 Dec 2.

Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors.密码本的视角：未表征的人类转录因子的序列特异性

bioRxiv. 2024 Nov 12:2024.11.11.622097. doi: 10.1101/2024.11.11.622097.

Cross-platform DNA motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors.跨平台DNA基序发现与基准测试，以探索研究较少的人类转录因子的结合特异性。

bioRxiv. 2024 Nov 13:2024.11.11.619379. doi: 10.1101/2024.11.11.619379.

Identifying transcription factors with cell-type specific DNA binding signatures.鉴定具有细胞类型特异性 DNA 结合特征的转录因子。

BMC Genomics. 2024 Oct 14;25(1):957. doi: 10.1186/s12864-024-10859-1.

Deciphering the impact of genomic variation on function.解读基因组变异对功能的影响。

Nature. 2024 Sep;633(8028):47-57. doi: 10.1038/s41586-024-07510-0. Epub 2024 Sep 4.

Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data.在ChIP-seq数据的从头基序发现中，基因组背景序列在系统上优于合成序列。

NAR Genom Bioinform. 2024 Jul 27;6(3):lqae090. doi: 10.1093/nargab/lqae090. eCollection 2024 Sep.

本文引用的文献

Co-SELECT reveals sequence non-specific contribution of DNA shape to transcription factor binding in vitro.共选择揭示了体外转录因子结合中 DNA 形状的序列非特异性贡献。

Nucleic Acids Res. 2019 Jul 26;47(13):6632-6641. doi: 10.1093/nar/gkz540.

Nat Genet. 2019 Jun;51(6):981-989. doi: 10.1038/s41588-019-0411-1. Epub 2019 May 27.

The European Nucleotide Archive in 2018.欧洲核苷酸档案库，2018 年。

Nucleic Acids Res. 2019 Jan 8;47(D1):D84-D88. doi: 10.1093/nar/gky1078.

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.JASPAR 2018：转录因子结合谱开放获取数据库及其网络框架的更新

Nucleic Acids Res. 2018 Jan 4;46(D1):D1284. doi: 10.1093/nar/gkx1188.

HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.HOCOMOCO：通过大规模的 ChIP-Seq 分析，构建人类和小鼠转录因子结合模型的完整集合。

Nucleic Acids Res. 2018 Jan 4;46(D1):D252-D259. doi: 10.1093/nar/gkx1106.

ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments.ReMap 2018：整合 DNA 结合 ChIP-seq 实验的分析结果，对调控区域进行的更新图谱绘制。

Nucleic Acids Res. 2018 Jan 4;46(D1):D267-D275. doi: 10.1093/nar/gkx1092.

TFClass: expanding the classification of human transcription factors to their mammalian orthologs.TFClass：扩展人类转录因子的分类，涵盖其哺乳动物同源物。

Nucleic Acids Res. 2018 Jan 4;46(D1):D343-D347. doi: 10.1093/nar/gkx987.

MGA repository: a curated data resource for ChIP-seq and other genome annotated data.MGA 数据库：一个经过精心整理的数据资源，包含 ChIP-seq 和其他基因组注释数据。

Nucleic Acids Res. 2018 Jan 4;46(D1):D175-D180. doi: 10.1093/nar/gkx995.

Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.通过定量特异性模型揭示的转录因子家族特异性DNA形状读出

Mol Syst Biol. 2017 Feb 6;13(2):910. doi: 10.15252/msb.20167238.

FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces.足迹数据库：植物顺式作用调控元件、转录因子及结合界面分析

Methods Mol Biol. 2016;1482:259-77. doi: 10.1007/978-1-4939-6396-6_17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从全面的全对全转录因子结合基序基准测试研究中获得的见解。

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献