MCAM：一种用于从高通量蛋白质组学数据集推导出假设和见解的多重聚类分析方法。

MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

机构信息

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

出版信息

PLoS Comput Biol. 2011 Jul;7(7):e1002119. doi: 10.1371/journal.pcbi.1002119. Epub 2011 Jul 21.

DOI:10.1371/journal.pcbi.1002119

PMID:21799663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3140961/

Abstract

Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems.

摘要

蛋白质组学技术的进步继续极大地提高了在生物样本中生成蛋白质水平、状态和活性的实验数据的能力。例如，现在可以研究受体酪氨酸激酶信号网络，以捕获多种条件下数百到数千种蛋白质的磷酸化状态。然而，对于许多这些蛋白质修饰的功能，或者负责修饰它们的酶，人们知之甚少。为了应对这一挑战，我们开发了一种方法，该方法增强了聚类技术推断细胞信号网络中蛋白质状态的功能和调节意义的能力。我们创建了一个新的计算框架，用于将聚类应用于生物数据，以克服对聚类技术特定先验假设和专家知识的典型依赖。多聚类分析方法（'MCAM'）以组合方式使用一系列不同的数据转换、距离度量、集合大小和聚类算法，来创建一组聚类集。然后，根据它们通过与蛋白质功能、激酶底物和序列基序相关的元数据的统计富集来产生生物学见解的能力来评估这些集。我们将 MCAM 应用于 ERRB 网络的一组动态磷酸化测量中，以探索算法参数与可以推断和报告有趣生物学预测的生物学意义之间的关系。此外，我们将 MCAM 应用于 ERBB 网络的多个磷酸蛋白质组学数据集，这使我们能够比较网络中磷酸化位点的独立和不完整重叠测量。我们报告了不同配体和 HER2 表达变化刺激下 ERBB 网络的特定和全局差异。总体而言，我们提供了 MCAM 作为一种广泛适用的蛋白质组学数据分析方法，这可能有助于提高对各种生物学问题中分子网络的当前理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c7d/3140961/edb5a434bbf6/pcbi.1002119.g001.jpg

相似文献

MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.MCAM：一种用于从高通量蛋白质组学数据集推导出假设和见解的多重聚类分析方法。

PLoS Comput Biol. 2011 Jul;7(7):e1002119. doi: 10.1371/journal.pcbi.1002119. Epub 2011 Jul 21.

Accounting for noise when clustering biological data.当对生物数据进行聚类时，要考虑噪声的影响。

Brief Bioinform. 2013 Jul;14(4):423-36. doi: 10.1093/bib/bbs057. Epub 2012 Oct 14.

KinPred: A unified and sustainable approach for harnessing proteome-level human kinase-substrate predictions.KinPred：一种利用蛋白质水平人类激酶-底物预测的统一且可持续的方法。

PLoS Comput Biol. 2021 Feb 8;17(2):e1008681. doi: 10.1371/journal.pcbi.1008681. eCollection 2021 Feb.

Proteomic Clustering Analysis of SH2 Domain Datasets.SH2结构域数据集的蛋白质组聚类分析

Methods Mol Biol. 2017;1555:99-113. doi: 10.1007/978-1-4939-6762-9_7.

Evaluation of clustering algorithms for protein complex and protein interaction network assembly.用于蛋白质复合物和蛋白质相互作用网络组装的聚类算法评估。

J Proteome Res. 2009 Jun;8(6):2944-52. doi: 10.1021/pr900073d.

Multi-layer Bundling as a New Approach for Determining Multi-scale Correlations Within a High-Dimensional Dataset.多层捆绑作为一种新方法，用于确定高维数据集内的多尺度相关性。

Bull Math Biol. 2024 Jul 12;86(9):105. doi: 10.1007/s11538-024-01335-8.

Wrangling phosphoproteomic data to elucidate cancer signaling pathways.解析磷酸化蛋白质组学数据以阐明癌症信号通路。

PLoS One. 2013;8(1):e52884. doi: 10.1371/journal.pone.0052884. Epub 2013 Jan 3.

Clustering and Network Analysis of Reverse Phase Protein Array Data.反向蛋白质阵列数据的聚类与网络分析

Methods Mol Biol. 2017;1606:171-191. doi: 10.1007/978-1-4939-6990-6_12.

WGCNA Application to Proteomic and Metabolomic Data Analysis.加权基因共表达网络分析在蛋白质组学和代谢组学数据分析中的应用

Methods Enzymol. 2017;585:135-158. doi: 10.1016/bs.mie.2016.09.016. Epub 2016 Dec 15.

Substrate-based kinase activity inference identifies MK2 as driver of colitis.基于底物的激酶活性推断将 MK2 鉴定为结肠炎的驱动因素。

Integr Biol (Camb). 2019 Nov 26;11(7):301-314. doi: 10.1093/intbio/zyz025.

引用本文的文献

ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets.ESCHR：一种针对不同数据集的稳健聚类的超参数随机集成方法。

Genome Biol. 2024 Sep 16;25(1):242. doi: 10.1186/s13059-024-03386-5.

Phosphoproteomics: a valuable tool for uncovering molecular signaling in cancer cells.磷酸化蛋白质组学：揭示癌细胞中分子信号的有力工具。

Expert Rev Proteomics. 2021 Aug;18(8):661-674. doi: 10.1080/14789450.2021.1976152. Epub 2021 Sep 16.

Hypercluster: a flexible tool for parallelized unsupervised clustering optimization.超聚类：用于并行无监督聚类优化的灵活工具。

BMC Bioinformatics. 2020 Sep 29;21(1):428. doi: 10.1186/s12859-020-03774-1.

Using phosphoproteomics data to understand cellular signaling: a comprehensive guide to bioinformatics resources.利用磷酸化蛋白质组学数据理解细胞信号传导：生物信息学资源综合指南

Clin Proteomics. 2020 Jul 11;17:27. doi: 10.1186/s12014-020-09290-x. eCollection 2020.

Decoding the PTM-switchboard of Notch.解析 Notch 的 PTM 开关。

Biochim Biophys Acta Mol Cell Res. 2019 Dec;1866(12):118507. doi: 10.1016/j.bbamcr.2019.07.002. Epub 2019 Jul 11.

Integrated time course omics analysis distinguishes immediate therapeutic response from acquired resistance.整合时间进程组学分析可区分即刻治疗反应和获得性耐药。

Genome Med. 2018 May 23;10(1):37. doi: 10.1186/s13073-018-0545-2.

A novel approach identifies the first transcriptome networks in bats: a new genetic model for vocal communication.一种新方法识别出蝙蝠中的首个转录组网络：用于声音交流的新遗传模型。

BMC Genomics. 2015 Oct 22;16:836. doi: 10.1186/s12864-015-2068-1.

Studying Cellular Signal Transduction with OMIC Technologies.利用组学技术研究细胞信号转导

J Mol Biol. 2015 Oct 23;427(21):3416-40. doi: 10.1016/j.jmb.2015.07.021. Epub 2015 Aug 3.

Quantitative multivariate analysis of dynamic multicellular morphogenic trajectories.动态多细胞形态发生轨迹的定量多变量分析

Integr Biol (Camb). 2015 Jul;7(7):825-33. doi: 10.1039/c5ib00072f.

MARQUIS: a multiplex method for absolute quantification of peptides and posttranslational modifications.MARQUIS：一种用于肽段和翻译后修饰绝对定量的多重方法。

Nat Commun. 2015 Jan 12;6:5924. doi: 10.1038/ncomms6924.

本文引用的文献

Analysing phosphorylation-based signalling networks by phospho flow cytometry.通过磷酸化流式细胞术分析磷酸化信号转导网络。

Cell Signal. 2011 Jan;23(1):14-8. doi: 10.1016/j.cellsig.2010.07.009. Epub 2010 Jul 16.

PTMScout, a Web resource for analysis of high throughput post-translational proteomics studies.PTMScout，一个用于分析高通量蛋白质组学研究的后翻译修饰的网络资源。

Mol Cell Proteomics. 2010 Nov;9(11):2558-70. doi: 10.1074/mcp.M110.001206. Epub 2010 Jul 14.

Novel invadopodia components revealed by differential proteomic analysis.通过差异蛋白质组学分析揭示的新型入侵伪足成分。

Eur J Cell Biol. 2011 Feb-Mar;90(2-3):115-27. doi: 10.1016/j.ejcb.2010.05.004. Epub 2010 Jul 6.

Decoding signalling networks by mass spectrometry-based proteomics.基于质谱的蛋白质组学解码信号转导网络。

Nat Rev Mol Cell Biol. 2010 Jun;11(6):427-39. doi: 10.1038/nrm2900. Epub 2010 May 12.

Tyrosine phosphorylation inhibits PKM2 to promote the Warburg effect and tumor growth.酪氨酸磷酸化抑制 PKM2 以促进瓦博格效应和肿瘤生长。

Sci Signal. 2009 Nov 17;2(97):ra73. doi: 10.1126/scisignal.2000431.

The cofilin activity cycle in lamellipodia and invadopodia.片状伪足和侵袭伪足中的丝切蛋白活性循环。

J Cell Biochem. 2009 Dec 15;108(6):1252-62. doi: 10.1002/jcb.22372.

GAREM, a novel adaptor protein for growth factor receptor-bound protein 2, contributes to cellular transformation through the activation of extracellular signal-regulated kinase signaling.GAREM是一种与生长因子受体结合蛋白2相关的新型衔接蛋白，通过激活细胞外信号调节激酶信号传导促进细胞转化。

J Biol Chem. 2009 Jul 24;284(30):20206-14. doi: 10.1074/jbc.M109.021139. Epub 2009 Jun 9.

Application of fuzzy c-means clustering in data analysis of metabolomics.模糊c均值聚类在代谢组学数据分析中的应用。

Anal Chem. 2009 Jun 1;81(11):4468-75. doi: 10.1021/ac900353t.

Annexin 2 has a dual role as regulator and effector of v-Src in cell transformation.膜联蛋白2在细胞转化过程中作为v-Src的调节因子和效应因子发挥双重作用。

J Biol Chem. 2009 Apr 10;284(15):10202-10. doi: 10.1074/jbc.M807043200. Epub 2009 Feb 4.

An integrated comparative phosphoproteomic and bioinformatic approach reveals a novel class of MPM-2 motifs upregulated in EGFRvIII-expressing glioblastoma cells.一种综合的比较磷酸化蛋白质组学和生物信息学方法揭示了一类在表达EGFRvIII的胶质母细胞瘤细胞中上调的新型MPM-2基序。

Mol Biosyst. 2009 Jan;5(1):59-67. doi: 10.1039/b815075c. Epub 2008 Oct 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MCAM：一种用于从高通量蛋白质组学数据集推导出假设和见解的多重聚类分析方法。

MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献