• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SPIC:一种基于信息内容比较转录因子结合位点基序的新型相似性度量。

SPIC: a novel similarity metric for comparing transcription factor binding site motifs based on information contents.

作者信息

Zhang Shaoqiang, Zhou Xiguo, Du Chuanbin, Su Zhengchang

出版信息

BMC Syst Biol. 2013;7 Suppl 2(Suppl 2):S14. doi: 10.1186/1752-0509-7-S2-S14. Epub 2013 Dec 17.

DOI:10.1186/1752-0509-7-S2-S14
PMID:24564945
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3866262/
Abstract

BACKGROUND

Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications.

METHODS

A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets.

RESULTS

When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to 1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs.

CONCLUSIONS

We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.

摘要

背景

发现转录因子结合位点(TFBS)是解读基因组中加密的复杂基因调控网络的主要挑战之一。由转录因子(TF)识别的一组短DNA序列被称为基序,它可以准确地以矩阵形式表示,如位置特异性评分矩阵(PSSM)和位置频率矩阵。我们经常需要通过寻找相似基序在基序数据库中查询一个基序,合并可能由同一TF识别的相似TFBS基序,分离不相关的基序,或过滤掉假基序。因此,需要一种新的度量标准来捕捉不相关基序之间的细微差异,并在所有这些应用中突出同一组基序之间的相似性。虽然之前已经提出了几种基序相似性度量标准,但它们在这些应用中的性能仍然远不能令人满意。

方法

本文提出了一种名为SPIC(带位置信息含量的相似性)的新度量标准,用于测量一个基序的一列与另一个基序的一列之间的相似性。在定义这个相似性分数时,我们考虑第一个基序的PFM的列由第二个基序的PSSM的列产生的可能性,并将该可能性乘以第二个基序的PSSM的列的信息含量,反之亦然。我们评估了结合具有仿射间隙罚分功能的局部或全局比对方法的SPIC在计算两个基序之间相似性方面的性能。我们还将SPIC与七种现有的最先进的度量标准进行了比较,比较它们在三个数据集上对同一组基序进行聚类以及从数据库中检索基序的能力。

结果

当与具有仿射间隙罚分函数(间隙开放罚分等于1,间隙延伸罚分等于0.5)的Smith-Waterman局部比对方法联合使用时,SPIC在数据库搜索中匹配基序、对同一TF的子基序进行聚类或从一组混杂的基序中区分相关基序方面,优于七种现有的最先进的基序相似性度量标准及其最佳比对。

结论

我们开发了一种新的基序相似性度量标准,与我们所知的其他七种度量标准相比,它在数据库搜索中能更准确地匹配基序,更有效地对相似基序进行聚类并区分不相关基序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/3866262/512ea5c93255/1752-0509-7-S2-S14-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/3866262/5b50c45375b8/1752-0509-7-S2-S14-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/3866262/512ea5c93255/1752-0509-7-S2-S14-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/3866262/5b50c45375b8/1752-0509-7-S2-S14-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d605/3866262/512ea5c93255/1752-0509-7-S2-S14-2.jpg

相似文献

1
SPIC: a novel similarity metric for comparing transcription factor binding site motifs based on information contents.SPIC:一种基于信息内容比较转录因子结合位点基序的新型相似性度量。
BMC Syst Biol. 2013;7 Suppl 2(Suppl 2):S14. doi: 10.1186/1752-0509-7-S2-S14. Epub 2013 Dec 17.
2
A novel alignment-free method for comparing transcription factor binding site motifs.一种新的无比对方法用于比较转录因子结合位点基序。
PLoS One. 2010 Jan 20;5(1):e8797. doi: 10.1371/journal.pone.0008797.
3
Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach.使用遗传k-中心点方法对转录因子结合基序进行无比对聚类。
BMC Bioinformatics. 2015 Jan 28;16:22. doi: 10.1186/s12859-015-0450-2.
4
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.通过期望最大化算法同时学习DNA基序及其位置和序列排名偏好。
J Comput Biol. 2013 Mar;20(3):237-48. doi: 10.1089/cmb.2012.0233.
5
CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.CLIMP:通过具有并行计算设计的最大团进行基序聚类
PLoS One. 2016 Aug 3;11(8):e0160435. doi: 10.1371/journal.pone.0160435. eCollection 2016.
6
Parametric bootstrapping for biological sequence motifs.生物序列基序的参数自举法
BMC Bioinformatics. 2016 Oct 6;17(1):406. doi: 10.1186/s12859-016-1246-8.
7
FISim: a new similarity measure between transcription factor binding sites based on the fuzzy integral.FISim:一种基于模糊积分的转录因子结合位点间新的相似性度量方法。
BMC Bioinformatics. 2009 Jul 20;10:224. doi: 10.1186/1471-2105-10-224.
8
GSMC: Combining Parallel Gibbs Sampling with Maximal Cliques for Hunting DNA Motif.GSMC:结合并行吉布斯采样与最大团来寻找DNA基序
J Comput Biol. 2017 Dec;24(12):1243-1253. doi: 10.1089/cmb.2017.0100. Epub 2017 Nov 8.
9
An Algorithm for Motif Discovery with Iteration on Lengths of Motifs.一种基于基序长度迭代的基序发现算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):136-41. doi: 10.1109/TCBB.2014.2351793.
10
MATLIGN: a motif clustering, comparison and matching tool.MATLIGN:一种基序聚类、比较和匹配工具。
BMC Bioinformatics. 2007 Jun 8;8:189. doi: 10.1186/1471-2105-8-189.

引用本文的文献

1
A map of cis-regulatory modules and constituent transcription factor binding sites in 80% of the mouse genome.在 80%的小鼠基因组中顺式调控模块和组成转录因子结合位点的图谱。
BMC Genomics. 2022 Oct 19;23(1):714. doi: 10.1186/s12864-022-08933-7.
2
Accurate prediction of -regulatory modules reveals a prevalent regulatory genome of humans.对调控模块的准确预测揭示了人类普遍存在的调控基因组。
NAR Genom Bioinform. 2021 Jun 17;3(2):lqab052. doi: 10.1093/nargab/lqab052. eCollection 2021 Jun.
3
FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.

本文引用的文献

1
Mechanisms of transcriptional precision in animal development.动物发育中转录精度的机制。
Trends Genet. 2012 Aug;28(8):409-16. doi: 10.1016/j.tig.2012.03.006. Epub 2012 Apr 16.
2
RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units).RegulonDB 7.0版本:整合在遗传感应反应单元(Gensor单元)内的大肠杆菌K-12转录调控。
Nucleic Acids Res. 2011 Jan;39(Database issue):D98-105. doi: 10.1093/nar/gkq1110. Epub 2010 Nov 4.
3
Simultaneous prediction of transcription factor binding sites in a group of prokaryotic genomes.
FisherMP:一种用于从大型 ChIP-seq 数据集中检测组合基序的完全并行算法。
DNA Res. 2019 Jun 1;26(3):231-242. doi: 10.1093/dnares/dsz004.
4
Performance evaluation for MOTIFSIM.MOTIFSIM的性能评估
Biol Proced Online. 2018 Dec 18;20:23. doi: 10.1186/s12575-018-0088-3. eCollection 2018.
5
Towards a map of cis-regulatory sequences in the human genome.构建人类基因组顺式调控序列图谱
Nucleic Acids Res. 2018 Jun 20;46(11):5395-5409. doi: 10.1093/nar/gky338.
6
GSMC: Combining Parallel Gibbs Sampling with Maximal Cliques for Hunting DNA Motif.GSMC:结合并行吉布斯采样与最大团来寻找DNA基序
J Comput Biol. 2017 Dec;24(12):1243-1253. doi: 10.1089/cmb.2017.0100. Epub 2017 Nov 8.
7
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.RSAT矩阵聚类:转录因子结合基序集合的动态探索与冗余减少
Nucleic Acids Res. 2017 Jul 27;45(13):e119. doi: 10.1093/nar/gkx314.
8
Assessment of transfer methods for comparative genomics of regulatory networks in bacteria.细菌调控网络比较基因组学转移方法的评估
BMC Bioinformatics. 2016 Aug 31;17 Suppl 8(Suppl 8):277. doi: 10.1186/s12859-016-1113-7.
9
CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design.CLIMP:通过具有并行计算设计的最大团进行基序聚类
PLoS One. 2016 Aug 3;11(8):e0160435. doi: 10.1371/journal.pone.0160435. eCollection 2016.
10
Cofunctional Subpathways Were Regulated by Transcription Factor with Common Motif, Common Family, or Common Tissue.共功能子途径由具有共同基序、共同家族或共同组织的转录因子调控。
Biomed Res Int. 2015;2015:780357. doi: 10.1155/2015/780357. Epub 2015 Nov 24.
同时预测一组原核生物基因组中的转录因子结合位点。
BMC Bioinformatics. 2010 Jul 23;11:397. doi: 10.1186/1471-2105-11-397.
4
Annotating non-coding regions of the genome.注释基因组的非编码区域。
Nat Rev Genet. 2010 Aug;11(8):559-71. doi: 10.1038/nrg2814. Epub 2010 Jul 13.
5
Deciphering the genome's regulatory code: the many languages of DNA.解读基因组的调控密码:DNA 的多种语言。
Bioessays. 2010 May;32(5):381-4. doi: 10.1002/bies.200900197.
6
A novel alignment-free method for comparing transcription factor binding site motifs.一种新的无比对方法用于比较转录因子结合位点基序。
PLoS One. 2010 Jan 20;5(1):e8797. doi: 10.1371/journal.pone.0008797.
7
Unlocking the secrets of the genome.揭开基因组的秘密。
Nature. 2009 Jun 18;459(7249):927-30. doi: 10.1038/459927a.
8
Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.原核生物顺式调控结合位点的全基因组从头预测。
Nucleic Acids Res. 2009 Jun;37(10):e72. doi: 10.1093/nar/gkp248. Epub 2009 Apr 21.
9
Natural similarity measures between position frequency matrices with an application to clustering.位置频率矩阵之间的自然相似性度量及其在聚类中的应用。
Bioinformatics. 2008 Feb 1;24(3):350-7. doi: 10.1093/bioinformatics/btm610. Epub 2008 Jan 2.
10
STAMP: a web tool for exploring DNA-binding motif similarities.STAMP:一个用于探索DNA结合基序相似性的网络工具。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8. doi: 10.1093/nar/gkm272. Epub 2007 May 3.