• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

位置频率矩阵之间的自然相似性度量及其在聚类中的应用。

Natural similarity measures between position frequency matrices with an application to clustering.

作者信息

Pape Utz J, Rahmann Sven, Vingron Martin

机构信息

Computational Biology, Max Planck Institute f. Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany.

出版信息

Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2.

DOI:10.1093/bioinformatics/btm610
PMID:18174183
Abstract

MOTIVATION

Transcription factors (TFs) play a key role in gene regulation by binding to target sequences. In silico prediction of potential binding of a TF to a binding site is a well-studied problem in computational biology. The binding sites for one TF are represented by a position frequency matrix (PFM). The discovery of new PFMs requires the comparison to known PFMs to avoid redundancies. In general, two PFMs are similar if they occur at overlapping positions under a null model. Still, most existing methods compute similarity according to probabilistic distances of the PFMs. Here we propose a natural similarity measure based on the asymptotic covariance between the number of PFM hits incorporating both strands. Furthermore, we introduce a second measure based on the same idea to cluster a set of the Jaspar PFMs.

RESULTS

We show that the asymptotic covariance can be efficiently computed by a two dimensional convolution of the score distributions. The asymptotic covariance approach shows strong correlation with simulated data. It outperforms three alternative methods. The Jaspar clustering yields distinct groups of TFs of the same class. Furthermore, a representative PFM is given for each class. In contrast to most other clustering methods, PFMs with low similarity automatically remain singletons.

AVAILABILITY

A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at http://mosta.molgen.mpg.de.

摘要

动机

转录因子(TFs)通过与靶序列结合在基因调控中发挥关键作用。在计算生物学中,对TF与结合位点潜在结合的计算机模拟预测是一个研究充分的问题。一个TF的结合位点由位置频率矩阵(PFM)表示。发现新的PFM需要与已知的PFM进行比较以避免冗余。一般来说,如果在零模型下两个PFM出现在重叠位置,则它们是相似的。然而,大多数现有方法根据PFM的概率距离来计算相似度。在此,我们提出一种基于纳入两条链的PFM命中数之间的渐近协方差的自然相似度度量。此外,我们基于相同的想法引入第二种度量来对一组Jaspar PFM进行聚类。

结果

我们表明,渐近协方差可以通过得分分布的二维卷积有效地计算。渐近协方差方法与模拟数据显示出很强的相关性。它优于三种替代方法。Jaspar聚类产生了同一类别的不同TF组。此外,为每个类别给出了一个代表性的PFM。与大多数其他聚类方法不同,相似度低的PFM会自动保持为单例。

可用性

一个用于计算相似度和进行聚类的网站,源代码和补充材料可在http://mosta.molgen.mpg.de获取。

相似文献

1
Natural similarity measures between position frequency matrices with an application to clustering.位置频率矩阵之间的自然相似性度量及其在聚类中的应用。
Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2.
2
Similarity of position frequency matrices for transcription factor binding sites.转录因子结合位点的位置频率矩阵的相似性。
Bioinformatics. 2005 Feb 1;21(3):307-13. doi: 10.1093/bioinformatics/bth480. Epub 2004 Aug 19.
3
Informative priors based on transcription factor structural class improve de novo motif discovery.基于转录因子结构类别的信息先验改进了从头基序发现。
Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251.
4
Context-specific independence mixture modeling for positional weight matrices.针对位置权重矩阵的上下文特定独立混合建模
Bioinformatics. 2006 Jul 15;22(14):e166-73. doi: 10.1093/bioinformatics/btl249.
5
Sequence features of DNA binding sites reveal structural class of associated transcription factor.DNA结合位点的序列特征揭示了相关转录因子的结构类别。
Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2.
6
Recognition of multiple patterns in unaligned sets of sequences: comparison of kernel clustering method with other methods.未对齐序列集中多种模式的识别:核聚类方法与其他方法的比较。
Bioinformatics. 2004 Jul 10;20(10):1512-6. doi: 10.1093/bioinformatics/bth111.
7
Method for identifying transcription factor binding sites in yeast.鉴定酵母中转录因子结合位点的方法。
Bioinformatics. 2006 Jul 15;22(14):1675-81. doi: 10.1093/bioinformatics/btl160. Epub 2006 Apr 27.
8
Simultaneous alignment and annotation of cis-regulatory regions.顺式调控区域的同步比对与注释
Bioinformatics. 2007 Jan 15;23(2):e44-9. doi: 10.1093/bioinformatics/btl305.
9
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.关于计算序列中的位置权重矩阵匹配及其在判别性基序发现中的应用。
Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.
10
Transcription factor binding site identification using the self-organizing map.使用自组织映射识别转录因子结合位点
Bioinformatics. 2005 May 1;21(9):1807-14. doi: 10.1093/bioinformatics/bti256. Epub 2005 Jan 12.

引用本文的文献

1
DNA-guided transcription factor interactions extend human gene regulatory code.DNA引导的转录因子相互作用扩展了人类基因调控密码。
Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08844-z.
2
A statistical approach for identifying single nucleotide variants that affect transcription factor binding.一种用于识别影响转录因子结合的单核苷酸变异的统计方法。
iScience. 2024 Apr 18;27(5):109765. doi: 10.1016/j.isci.2024.109765. eCollection 2024 May 17.
3
An atlas of the binding specificities of transcription factors in directs prediction of novel regulators in virulence.
转录因子结合特异性图谱可直接预测毒力中的新型调节因子。
Elife. 2021 Mar 29;10:e61885. doi: 10.7554/eLife.61885.
4
A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae.丁香假单胞菌转录因子的 DNA 结合特异性纲要。
Nat Commun. 2020 Oct 2;11(1):4947. doi: 10.1038/s41467-020-18744-7.
5
Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences.人 RNA 结合蛋白对结构和线性 RNA 序列的结合特异性。
Genome Res. 2020 Jul;30(7):962-973. doi: 10.1101/gr.258848.119. Epub 2020 Jul 23.
6
Insights into the Diversification and Evolution of R2R3-MYB Transcription Factors in Plants.植物 R2R3-MYB 转录因子的多样化和进化研究进展。
Plant Physiol. 2020 Jun;183(2):637-655. doi: 10.1104/pp.19.01082. Epub 2020 Apr 14.
7
Improved linking of motifs to their TFs using domain information.利用域信息改进基序与其 TF 的关联。
Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.
8
DeeReCT-PolyA: a robust and generic deep learning method for PAS identification.DeeReCT-PolyA:一种用于 PAS 识别的强大且通用的深度学习方法。
Bioinformatics. 2019 Jul 15;35(14):2371-2379. doi: 10.1093/bioinformatics/bty991.
9
TEPIC 2-an extended framework for transcription factor binding prediction and integrative epigenomic analysis.TEPIC 2-转录因子结合预测和综合表观基因组分析的扩展框架。
Bioinformatics. 2019 May 1;35(9):1608-1609. doi: 10.1093/bioinformatics/bty856.
10
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.RSAT矩阵聚类:转录因子结合基序集合的动态探索与冗余减少
Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.