基于轮廓的短线性蛋白质基序发现。

Profile-based short linear protein motif discovery.

机构信息

Complex and Adaptive Systems Laboratory, University College Dublin, Ireland.

出版信息

BMC Bioinformatics. 2012 May 18;13:104. doi: 10.1186/1471-2105-13-104.

DOI:10.1186/1471-2105-13-104

PMID:22607209

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3534220/

Abstract

BACKGROUND

Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation.

RESULTS

The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset.

CONCLUSIONS

Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.

摘要

背景

短线性蛋白基序作为功能独立的位点越来越受到关注，通常长度为 3-10 个氨基酸，富含蛋白质的无序区域。最近提出了多种方法来基于简单正则表达式在一组蛋白质中发现过度表达的基序。在这里，我们将这些方法扩展到基于轮廓的方法，这些方法提供了更丰富的基序表示。

结果

MEME 轮廓基序发现方法在蛋白质无序区域的基序方面表现相对较差。然而，当我们应用进化加权来解释同源蛋白质之间的冗余，并掩盖无序蛋白质中保守性差的区域时，MEME 的性能与正则表达式方法相当。然而，这两种方法在基准数据集和更现实的发现数据集中都返回了不同的子集。

结论

基于轮廓的基序发现方法补充了基于正则表达式的方法。虽然基于轮廓的方法计算上更密集，但它们很可能会发现当前被正则表达式方法忽略的基序。

相似文献

Profile-based short linear protein motif discovery.基于轮廓的短线性蛋白质基序发现。

BMC Bioinformatics. 2012 May 18;13:104. doi: 10.1186/1471-2105-13-104.

Discovering short linear protein motif based on selective training of profile hidden Markov models.基于轮廓隐马尔可夫模型的选择性训练发现短线性蛋白质基序。

J Theor Biol. 2015 Jul 21;377:75-84. doi: 10.1016/j.jtbi.2015.03.010. Epub 2015 Mar 17.

Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins.估计和有效计算短线性蛋白质序列基序在无关蛋白质中的真实重现概率。

BMC Bioinformatics. 2010 Jan 7;11:14. doi: 10.1186/1471-2105-11-14.

Prediction of short linear protein binding regions.预测短线性蛋白结合区域。

J Mol Biol. 2012 Jan 6;415(1):193-204. doi: 10.1016/j.jmb.2011.10.025. Epub 2011 Oct 21.

Computational identification and analysis of protein short linear motifs.计算鉴定和分析蛋白质短线性基序。

Front Biosci (Landmark Ed). 2010 Jun 1;15(3):801-25. doi: 10.2741/3647.

SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions.SLiMPrints：基于保守性的功能基序指纹在无规卷曲蛋白质区域中的发现。

Nucleic Acids Res. 2012 Nov;40(21):10628-41. doi: 10.1093/nar/gks854. Epub 2012 Sep 12.

SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent.SLiMDisc：短线性基序发现，校正共同进化起源。

Nucleic Acids Res. 2006 Jul 19;34(12):3546-54. doi: 10.1093/nar/gkl486. Print 2006.

TrieAMD: a scalable and efficient apriori motif discovery approach.TrieAMD：一种可扩展且高效的先验基序发现方法。

Int J Data Min Bioinform. 2015;13(1):13-30. doi: 10.1504/ijdmb.2015.070833.

ARCS-Motif: discovering correlated motifs from unaligned biological sequences.ARCS基序：从未比对的生物序列中发现相关基序。

Bioinformatics. 2009 Jan 15;25(2):183-9. doi: 10.1093/bioinformatics/btn609. Epub 2008 Dec 9.

SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.SLiMFinder：一种用于识别蛋白质中过度表达、趋同进化的短线性基序的概率方法。

PLoS One. 2007 Oct 3;2(10):e967. doi: 10.1371/journal.pone.0000967.

引用本文的文献

The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins.短线性基序特征在钙调蛋白结合蛋白预测中的预测性能。

BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):410. doi: 10.1186/s12859-018-2378-9.

Large-scale protein function prediction using heterogeneous ensembles.使用异构集成进行大规模蛋白质功能预测。

F1000Res. 2018 Sep 28;7. doi: 10.12688/f1000research.16415.1. eCollection 2018.

DoReMi: context-based prioritization of linear motif matches.DoReMi：基于上下文的线性基序匹配优先级排序。

PeerJ. 2014 Mar 20;2:e315. doi: 10.7717/peerj.315. eCollection 2014.

SeqNLS: nuclear localization signal prediction based on frequent pattern mining and linear motif scoring.SeqNLS：基于频繁模式挖掘和线性基序评分的核定位信号预测

PLoS One. 2013 Oct 29;8(10):e76864. doi: 10.1371/journal.pone.0076864. eCollection 2013.

MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation.MFSPSSMpred：基于上下文局部进化保守性识别无序蛋白中的短无序到有序结合区域。

BMC Bioinformatics. 2013 Oct 4;14:300. doi: 10.1186/1471-2105-14-300.

Predicting binding within disordered protein regions to structurally characterised peptide-binding domains.预测无序蛋白质区域与结构确定的肽结合结构域的结合。

PLoS One. 2013 Sep 3;8(9):e72838. doi: 10.1371/journal.pone.0072838. eCollection 2013.

本文引用的文献

ELM--the database of eukaryotic linear motifs.ELM——真核线性基序数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D242-51. doi: 10.1093/nar/gkr1064. Epub 2011 Nov 21.

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.使用 Clustal Omega 快速、可扩展地生成高质量蛋白质多重序列比对。

Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.

LigPlot+: multiple ligand-protein interaction diagrams for drug discovery.LigPlot+：用于药物发现的多种配体-蛋白质相互作用图。

J Chem Inf Model. 2011 Oct 24;51(10):2778-86. doi: 10.1021/ci200227u. Epub 2011 Oct 5.

Attributes of short linear motifs.短线性基序的属性。

Mol Biosyst. 2012 Jan;8(1):268-81. doi: 10.1039/c1mb05231d. Epub 2011 Sep 12.

SLiMSearch 2.0: biological context for short linear motifs in proteins.SLiMSearch 2.0：蛋白质中短线性基序的生物学背景。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W56-60. doi: 10.1093/nar/gkr402. Epub 2011 May 26.

HMMER web server: interactive sequence similarity searching.HMMER 网页服务器：交互式序列相似性搜索。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.

Phospho.ELM: a database of phosphorylation sites--update 2011.磷酸化位点数据库Phospho.ELM：2011年更新版

Nucleic Acids Res. 2011 Jan;39(Database issue):D261-7. doi: 10.1093/nar/gkq1104. Epub 2010 Nov 9.

SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs.SLiMFinder：一个用于发现新颖的、显著过度表达的短蛋白基序的网络服务器。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W534-9. doi: 10.1093/nar/gkq440. Epub 2010 May 23.

ELM: the status of the 2010 eukaryotic linear motif resource.ELM：2010 年真核线性基序资源的现状。

Nucleic Acids Res. 2010 Jan;38(Database issue):D167-80. doi: 10.1093/nar/gkp1016. Epub 2009 Nov 17.

A structure filter for the Eukaryotic Linear Motif Resource.真核线性基序资源的结构过滤器。

BMC Bioinformatics. 2009 Oct 24;10:351. doi: 10.1186/1471-2105-10-351.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验