Suppr超能文献

基于杰卡德指数的相似度度量,用于比较转录因子结合位点模型。

Jaccard index based similarity measure to compare transcription factor binding site models.

作者信息

Vorontsov Ilya E, Kulakovskiy Ivan V, Makeev Vsevolod J

机构信息

Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina str. 3, Moscow 119991, GSP-1, Russia.

Data Analysis Department, Yandex Data Analysis School, Moscow Institute of Physics and Technology, Leo Tolstoy str. 16, Moscow 119021, Russia.

出版信息

Algorithms Mol Biol. 2013 Sep 30;8(1):23. doi: 10.1186/1748-7188-8-23.

Abstract

BACKGROUND

Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds.

RESULTS

We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation).

CONCLUSIONS

MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query.

AVAILABILITY AND IMPLEMENTATION

MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.

摘要

背景

位置权重矩阵(PWM)仍然是用于量化转录因子(TF)结合的最常用方法。带有得分阈值的PWM定义了一组推定的转录因子结合位点(TFBS),从而提供了一个TFBS模型。通过不同实验方法获得的TF结合DNA片段通常会给出相似但不完全相同的PWM。对于来自同一结构家族的不同TF,情况也是如此。因此,经常需要测量PWM之间的相似性。流行的工具直接使用矩阵元素比较PWM。然而,对于对数几率PWM,负元素对高分TFBS的得分没有贡献,因此在不影响最佳识别结合位点集的情况下可能会有所不同。此外,取决于得分阈值,由给定的一对PWM识别的两个TFBS集可能或多或少有所不同。

结果

我们提出了一种实用的方法来比较两个TFBS模型,每个模型由一个PWM和各自的得分阈值组成。所提出的度量是两个TFBS集之间杰卡德指数的一种变体。该度量为所有有限长度的TFBS模型定义了一个度量空间。该算法可以比较使用实质上不同的方法构建的TFBS模型,如具有原始位置计数和对数几率的PWM。我们展示了高效的软件实现:MACRO - APE(通过近似P值估计进行矩阵比较)。

结论

MACRO - APE可有效地用于计算两个TFBS模型基于杰卡德指数的相似性。提出了一种两遍扫描算法,用于在给定的PWM集合中扫描与给定查询相似的PWM。

可用性和实现

MACRO - APE用ruby 1.9实现;软件包括源代码和手册可在http://autosome.ru/macroape/以及补充材料中免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89c8/3851813/0ce46666e431/1748-7188-8-23-1.jpg

相似文献

1
Jaccard index based similarity measure to compare transcription factor binding site models.
Algorithms Mol Biol. 2013 Sep 30;8(1):23. doi: 10.1186/1748-7188-8-23.
3
A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.
Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.
4
From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites.
J Bioinform Comput Biol. 2013 Feb;11(1):1340004. doi: 10.1142/S0219720013400040. Epub 2013 Jan 16.
6
HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.
Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089. Epub 2012 Nov 21.
7
Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.
Interdiscip Sci. 2017 Sep;9(3):378-391. doi: 10.1007/s12539-016-0168-5. Epub 2016 Apr 6.
8
A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.
PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.
9
A novel method for improved accuracy of transcription factor binding site prediction.
Nucleic Acids Res. 2018 Jul 6;46(12):e72. doi: 10.1093/nar/gky237.
10
EMQIT: a machine learning approach for energy based PWM matrix quality improvement.
Biol Direct. 2017 Aug 1;12(1):17. doi: 10.1186/s13062-017-0189-y.

引用本文的文献

1
A novel flexible near-infrared endoscopic device that enables real-time artificial intelligence fluorescence tissue characterization.
PLoS One. 2025 Mar 13;20(3):e0317771. doi: 10.1371/journal.pone.0317771. eCollection 2025.
2
A generative framework for enhanced cell-type specificity in rationally designed mRNAs.
bioRxiv. 2024 Dec 31:2024.12.31.630783. doi: 10.1101/2024.12.31.630783.
3
SampleExplorer: using language models to discover relevant transcriptome data.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae759.
8
Stage-specific coexpression network analysis of Myc in cohorts of renal cancer.
Sci Rep. 2023 Jul 22;13(1):11848. doi: 10.1038/s41598-023-38681-x.
9
A survey on algorithms to characterize transcription factor binding sites.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.
10
A multi-modal machine learning approach to detect extreme rainfall events in Sicily.
Sci Rep. 2023 Apr 16;13(1):6196. doi: 10.1038/s41598-023-33160-9.

本文引用的文献

1
HOCOMOCO: a comprehensive collection of human transcription factor binding sites models.
Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089. Epub 2012 Nov 21.
3
RSAT 2011: regulatory sequence analysis tools.
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W86-91. doi: 10.1093/nar/gkr377.
5
Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities.
Genome Res. 2010 Jun;20(6):861-73. doi: 10.1101/gr.100552.109. Epub 2010 Apr 8.
6
JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles.
Nucleic Acids Res. 2010 Jan;38(Database issue):D105-10. doi: 10.1093/nar/gkp950. Epub 2009 Nov 11.
7
A novel Bayesian DNA motif comparison method for clustering and retrieval.
PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.
8
Natural similarity measures between position frequency matrices with an application to clustering.
Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2.
10
Efficient and accurate P-value computation for Position Weight Matrices.
Algorithms Mol Biol. 2007 Dec 11;2:15. doi: 10.1186/1748-7188-2-15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验