Cheng Alice, Grant Charles E, Noble William S, Bailey Timothy L
Department of Genome Sciences, University of Washington, Seattle, WA, USA.
Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
Bioinformatics. 2019 Aug 15;35(16):2774-2782. doi: 10.1093/bioinformatics/bty1058.
Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation.
We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support.
The MoMo web server and source code are provided at http://meme-suite.org.
Supplementary data are available at Bioinformatics online.
蛋白质的翻译后修饰(PTM)与许多重要的生物学功能相关,并且可以使用串联质谱进行高通量鉴定。许多PTM与称为“基序”的短序列模式相关,这些基序有助于定位修饰酶。因此,已经设计了许多算法来从质谱数据中识别这些基序。对发现的基序进行准确的统计置信度估计对于正确解释和下游实验验证的设计至关重要。
我们描述了一种为PTM基序分配统计置信度估计的方法,并证明该方法在模拟数据和真实数据上都能提供准确的P值。我们的方法在MoMo中实现,MoMo是一种用于在PTM集合中发现基序的软件工具,我们将其作为网络服务器和可下载的源代码提供。MoMo重新实现了两种最广泛使用的PTM基序发现算法——Motif-X和MoDL,同时提供了许多增强功能。相对于Motif-X,MoMo提供了改进的统计置信度估计和更准确的基序分数计算。MoMo网络服务器比Motif-X网络服务器提供更多的蛋白质组数据库、更多的输入格式、更大的输入和更长的运行时间。最后,我们的研究表明Motif-X产生的置信度估计不准确。这种不准确部分源于从未洗牌的蛋白质组数据库中提取“背景”肽的常见做法。因此,我们的结果表明,许多使用Motif-X来寻找基序的论文可能报告的结果缺乏统计支持。
MoMo网络服务器和源代码可在http://meme-suite.org获得。
补充数据可在《生物信息学》在线获取。