• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

估计和有效计算短线性蛋白质序列基序在无关蛋白质中的真实重现概率。

Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins.

机构信息

UCD Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland.

出版信息

BMC Bioinformatics. 2010 Jan 7;11:14. doi: 10.1186/1471-2105-11-14.

DOI:10.1186/1471-2105-11-14
PMID:20055997
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2819990/
Abstract

BACKGROUND

Large datasets of protein interactions provide a rich resource for the discovery of Short Linear Motifs (SLiMs) that recur in unrelated proteins. However, existing methods for estimating the probability of motif recurrence may be biased by the size and composition of the search dataset, such that p-value estimates from different datasets, or from motifs containing different numbers of non-wildcard positions, are not strictly comparable. Here, we develop more exact methods and explore the potential biases of computationally efficient approximations.

RESULTS

A widely used heuristic for the calculation of motif over-representation approximates motif probability by assuming that all proteins have the same length and composition. We introduce pv, which calculates the probability exactly. Secondly, the recently introduced SLiMFinder statistic Sig, accounts for multiple testing (across all possible motifs) in motif discovery. However, it approximates the probability of all other possible motifs, occurring with a score of p or less, as being equal to p. Here, we show that the exhaustive calculation of the probability of all possible motif occurrences that are as rare or rarer than the motif of interest, Sig', may be carried out efficiently by grouping motifs of a common probability (i.e. those which have permuted orders of the same residues). Sig'v, which corrects both approximations, is shown to be uniformly distributed in a random dataset when searching for non-ambiguous motifs, indicating that it is a robust significance measure.

CONCLUSIONS

A method is presented to compute exactly the true probability of a non-ambiguous short protein sequence motif, and the utility of an approximate approach for novel motif discovery across a large number of datasets is demonstrated.

摘要

背景

蛋白质相互作用的大型数据集为发现重复出现在不相关蛋白质中的短线性基序(SLiM)提供了丰富的资源。然而,用于估计基序重复概率的现有方法可能会受到搜索数据集的大小和组成的影响,因此来自不同数据集或包含不同数量非通配位置的基序的 p 值估计值并不完全可比。在这里,我们开发了更精确的方法,并探讨了计算效率高的近似方法的潜在偏差。

结果

计算基序过度表示的一种广泛使用的启发式方法通过假设所有蛋白质具有相同的长度和组成来近似基序概率。我们引入 pv,它可以准确地计算概率。其次,最近引入的 SLiMFinder 统计量 Sig 在基序发现中考虑了多重检验(针对所有可能的基序)。然而,它将所有其他可能的基序(得分 p 或更低)的概率近似为 p。在这里,我们表明,可以通过对与感兴趣的基序一样罕见或更罕见的所有可能基序的出现概率进行穷举计算来有效地计算 Sig',即通过对具有相同残基排列的基序进行分组。Sig'v 纠正了这两个近似值,当在随机数据集中搜索非歧义基序时,它均匀分布,表明它是一种稳健的显著度量。

结论

提出了一种方法来准确计算非歧义短蛋白质序列基序的真实概率,并展示了一种用于在大量数据集上进行新基序发现的近似方法的实用性。

相似文献

1
Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins.估计和有效计算短线性蛋白质序列基序在无关蛋白质中的真实重现概率。
BMC Bioinformatics. 2010 Jan 7;11:14. doi: 10.1186/1471-2105-11-14.
2
SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.SLiMFinder:一种用于识别蛋白质中过度表达、趋同进化的短线性基序的概率方法。
PLoS One. 2007 Oct 3;2(10):e967. doi: 10.1371/journal.pone.0000967.
3
The SLiMDisc server: short, linear motif discovery in proteins.SLiMDisc服务器:蛋白质中短线性基序的发现
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W455-9. doi: 10.1093/nar/gkm400. Epub 2007 Jun 18.
4
Computational identification and analysis of protein short linear motifs.计算鉴定和分析蛋白质短线性基序。
Front Biosci (Landmark Ed). 2010 Jun 1;15(3):801-25. doi: 10.2741/3647.
5
Fast and accurate discovery of degenerate linear motifs in protein sequences.在蛋白质序列中快速准确地发现简并线性基序
PLoS One. 2014 Sep 10;9(9):e106081. doi: 10.1371/journal.pone.0106081. eCollection 2014.
6
A correlated motif approach for finding short linear motifs from protein interaction networks.一种用于从蛋白质相互作用网络中寻找短线性基序的相关基序方法。
BMC Bioinformatics. 2006 Nov 16;7:502. doi: 10.1186/1471-2105-7-502.
7
Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery.使用特定上下文进化保守性来掩盖残基可显著改善短线性基序的发现。
Bioinformatics. 2009 Feb 15;25(4):443-50. doi: 10.1093/bioinformatics/btn664. Epub 2009 Jan 9.
8
SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent.SLiMDisc:短线性基序发现,校正共同进化起源。
Nucleic Acids Res. 2006 Jul 19;34(12):3546-54. doi: 10.1093/nar/gkl486. Print 2006.
9
Profile-based short linear protein motif discovery.基于轮廓的短线性蛋白质基序发现。
BMC Bioinformatics. 2012 May 18;13:104. doi: 10.1186/1471-2105-13-104.
10
Computational Prediction of Disordered Protein Motifs Using SLiMSuite.使用 SLiMSuite 进行无规则蛋白基序的计算预测。
Methods Mol Biol. 2020;2141:37-72. doi: 10.1007/978-1-0716-0524-0_3.

引用本文的文献

1
The Functional Human C-Terminome.功能性人类C端蛋白质组
PLoS One. 2016 Apr 6;11(4):e0152731. doi: 10.1371/journal.pone.0152731. eCollection 2016.
2
SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks.SLiMScape 3.x:一款用于在蛋白质相互作用网络中发现短线性基序的Cytoscape 3应用程序。
F1000Res. 2015 Aug 5;4:477. doi: 10.12688/f1000research.6773.1. eCollection 2015.
3
QSLiMFinder: improved short linear motif prediction using specific query protein data.QSLiMFinder:利用特定查询蛋白质数据改进短线性基序预测

本文引用的文献

1
Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery.使用特定上下文进化保守性来掩盖残基可显著改善短线性基序的发现。
Bioinformatics. 2009 Feb 15;25(4):443-50. doi: 10.1093/bioinformatics/btn664. Epub 2009 Jan 9.
2
Contextual specificity in peptide-mediated protein interactions.肽介导的蛋白质相互作用中的上下文特异性。
PLoS One. 2008 Jul 2;3(7):e2524. doi: 10.1371/journal.pone.0002524.
3
Understanding eukaryotic linear motifs and their role in cell signaling and regulation.
Bioinformatics. 2015 Jul 15;31(14):2284-93. doi: 10.1093/bioinformatics/btv155. Epub 2015 Mar 19.
4
Predicting binding within disordered protein regions to structurally characterised peptide-binding domains.预测无序蛋白质区域与结构确定的肽结合结构域的结合。
PLoS One. 2013 Sep 3;8(9):e72838. doi: 10.1371/journal.pone.0072838. eCollection 2013.
5
Disordered binding regions and linear motifs--bridging the gap between two models of molecular recognition.无序结合区域和线性基序——连接两种分子识别模型之间的桥梁。
PLoS One. 2012;7(10):e46829. doi: 10.1371/journal.pone.0046829. Epub 2012 Oct 3.
6
PepSite: prediction of peptide-binding sites from protein surfaces.PepSite:从蛋白质表面预测肽结合位点。
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W423-7. doi: 10.1093/nar/gks398. Epub 2012 May 16.
7
PChopper: high throughput peptide prediction for MRM/SRM transition design.PChopper:用于 MRM/SRM 转换设计的高通量肽预测。
BMC Bioinformatics. 2011 Aug 15;12:338. doi: 10.1186/1471-2105-12-338.
8
HIVToolbox, an integrated web application for investigating HIV.HIVToolbox,一个用于研究 HIV 的集成网络应用程序。
PLoS One. 2011;6(5):e20122. doi: 10.1371/journal.pone.0020122. Epub 2011 May 25.
9
SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs.SLiMFinder:一个用于发现新颖的、显著过度表达的短蛋白基序的网络服务器。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W534-9. doi: 10.1093/nar/gkq440. Epub 2010 May 23.
了解真核生物线性基序及其在细胞信号传导和调控中的作用。
Front Biosci. 2008 May 1;13:6580-603. doi: 10.2741/3175.
4
A careful disorderliness in the proteome: sites for interaction and targets for future therapies.蛋白质组中精心设计的无序性:相互作用位点及未来治疗靶点
FEBS Lett. 2008 Apr 9;582(8):1271-5. doi: 10.1016/j.febslet.2008.02.027. Epub 2008 Feb 20.
5
Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.利用细胞周期关键词富集结合天然无序预测和基序保守性发现候选KEN盒基序。
Bioinformatics. 2008 Feb 15;24(4):453-7. doi: 10.1093/bioinformatics/btm624. Epub 2008 Jan 9.
6
LOCATE: a mammalian protein subcellular localization database.LOCATE:一个哺乳动物蛋白质亚细胞定位数据库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D230-3. doi: 10.1093/nar/gkm950. Epub 2007 Nov 5.
7
Phospho.ELM: a database of phosphorylation sites--update 2008.磷酸化位点数据库Phospho.ELM:2008年更新版
Nucleic Acids Res. 2008 Jan;36(Database issue):D240-4. doi: 10.1093/nar/gkm772. Epub 2007 Oct 25.
8
Characterization of protein hubs by inferring interacting motifs from protein interactions.通过从蛋白质相互作用中推断相互作用基序来表征蛋白质枢纽。
PLoS Comput Biol. 2007 Sep;3(9):1761-71. doi: 10.1371/journal.pcbi.0030178. Epub 2007 Jul 30.
9
SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.SLiMFinder:一种用于识别蛋白质中过度表达、趋同进化的短线性基序的概率方法。
PLoS One. 2007 Oct 3;2(10):e967. doi: 10.1371/journal.pone.0000967.
10
Reuse of structural domain-domain interactions in protein networks.蛋白质网络中结构域-结构域相互作用的重复利用。
BMC Bioinformatics. 2007 Jul 18;8:259. doi: 10.1186/1471-2105-8-259.