一种基于知识的T2统计量，用于对定量蛋白质组学数据进行通路分析。

A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

作者信息

Lai En-Yu, Chen Yi-Hau, Wu Kun-Pin

机构信息

Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan.

Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 11529, Taiwan.

出版信息

PLoS Comput Biol. 2017 Jun 16;13(6):e1005601. doi: 10.1371/journal.pcbi.1005601. eCollection 2017 Jun.

DOI:10.1371/journal.pcbi.1005601

PMID:28622336

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5493430/

Abstract

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.

摘要

近年来已开发出从高通量定量数据中识别重要通路的方法。然而，由于样本量有限，蛋白质组学数据的分析仍然困难。这种限制还导致了使用竞争性零假设作为常用方法的做法；这从根本上意味着将基因或蛋白质视为独立单元。独立假设忽略了具有相似功能或细胞定位的生物分子之间的关联，以及它们之间表现为表达比率变化的相互作用。因此，这些方法在实际应用中往往低估了生物分子之间的关联，并导致假阳性。一些研究将样本协方差矩阵纳入计算以解决此问题。然而，如果样本量非常有限，样本协方差可能不是一个精确的估计，而这通常是质谱产生的数据的情况。在本研究中，我们引入了一种在自包含零假设下的多变量检验，用于对定量蛋白质组学数据进行通路分析。检验统计量中使用的协方差矩阵由从STRING数据库或HitPredict数据库检索到的置信分数构建。我们还设计了一个整合程序，以保留有足够证据的通路作为一个通路组。使用五个已发表的实验数据集证明了所提出的T2统计量的性能：T细胞活化、cAMP/PKA信号传导、成肌细胞分化以及达沙替尼对BCR-ABL通路的影响是由质谱产生的蛋白质组学数据集；而肌纤蛋白通过MAPK信号通路的保护作用是一个样本量有限的基因表达数据集。与其他常用统计量相比，所提出的T2统计量产生了更准确的描述，与原始出版物的讨论一致。我们将T2统计量实现为一个R包T2GA，可在https://github.com/roqe/T2GA上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc4/5493430/64fcabaa2bd7/pcbi.1005601.g001.jpg

相似文献

A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.一种基于知识的T2统计量，用于对定量蛋白质组学数据进行通路分析。

PLoS Comput Biol. 2017 Jun 16;13(6):e1005601. doi: 10.1371/journal.pcbi.1005601. eCollection 2017 Jun.

MMG: a probabilistic tool to identify submodules of metabolic pathways.MMG：一种用于识别代谢途径子模块的概率工具。

Bioinformatics. 2008 Apr 15;24(8):1078-84. doi: 10.1093/bioinformatics/btn066. Epub 2008 Feb 21.

Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。

BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.

A multivariate approach for integrating genome-wide expression data and biological knowledge.一种整合全基因组表达数据与生物学知识的多变量方法。

Bioinformatics. 2006 Oct 1;22(19):2373-80. doi: 10.1093/bioinformatics/btl401. Epub 2006 Jul 28.

Significance analysis of groups of genes in expression profiling studies.表达谱研究中基因分组的显著性分析。

Bioinformatics. 2007 Aug 15;23(16):2104-12. doi: 10.1093/bioinformatics/btm310. Epub 2007 Jun 6.

Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments.基于排列的多因素微阵列实验多重检验中零统计量的构建。

Bioinformatics. 2006 Jun 15;22(12):1486-94. doi: 10.1093/bioinformatics/btl109. Epub 2006 Mar 30.

mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.mapDIA：来自数据非依赖采集质谱的定量蛋白质组学数据的预处理和统计分析

J Proteomics. 2015 Nov 3;129:108-120. doi: 10.1016/j.jprot.2015.09.013. Epub 2015 Sep 15.

A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.三种可分离协方差矩阵结构的似然比检验与拉奥得分检验的比较。

Biom J. 2017 Jan;59(1):192-215. doi: 10.1002/bimj.201600044. Epub 2016 Oct 24.

Important issues in planning a proteomics experiment: statistical considerations of quantitative proteomic data.蛋白质组学实验规划中的重要问题：定量蛋白质组学数据的统计学考量

Methods Mol Biol. 2012;893:3-21. doi: 10.1007/978-1-61779-885-6_1.

An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification.一种基于优化的框架，用于将不完整的生物学知识转化为概率结构及其在离散表型分类中基因/蛋白质信号通路利用的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1304-21. doi: 10.1109/TCBB.2015.2424407.

本文引用的文献

HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.HitPredict版本4：来自100多个物种的物理蛋白质-蛋白质相互作用的综合可靠性评分

Database (Oxford). 2015 Dec 26;2015. doi: 10.1093/database/bav117. Print 2015.

The Reactome pathway Knowledgebase.Reactome通路知识库。

Nucleic Acids Res. 2016 Jan 4;44(D1):D481-7. doi: 10.1093/nar/gkv1351. Epub 2015 Dec 9.

Cellular Proteome Dynamics during Differentiation of Human Primary Myoblasts.人原代成肌细胞分化过程中的细胞蛋白质组动力学

J Proteome Res. 2015 Aug 7;14(8):3348-61. doi: 10.1021/acs.jproteome.5b00397. Epub 2015 Jun 29.

The role of dasatinib in the management of chronic myeloid leukemia.达沙替尼在慢性髓性白血病治疗中的作用。

Drug Des Devel Ther. 2015 Feb 9;9:773-9. doi: 10.2147/DDDT.S80207. eCollection 2015.

Ensembl 2015.Ensembl 2015.

Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28.

Myocilin regulates cell proliferation and survival.肌球蛋白调节细胞增殖和存活。

J Biol Chem. 2014 Apr 4;289(14):10155-67. doi: 10.1074/jbc.M113.547091. Epub 2014 Feb 22.

Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway.大规模蛋白质组学数据的方向途径分析揭示了胰岛素作用途径的新特征。

Bioinformatics. 2014 Mar 15;30(6):808-14. doi: 10.1093/bioinformatics/btt616. Epub 2013 Oct 27.

Analysis and correction of crosstalk effects in pathway analysis.分析和校正通路分析中的串扰效应。

Genome Res. 2013 Nov;23(11):1885-93. doi: 10.1101/gr.153551.112. Epub 2013 Aug 9.

Interrogating cAMP-dependent kinase signaling in Jurkat T cells via a protein kinase A targeted immune-precipitation phosphoproteomics approach.通过蛋白激酶 A 靶向免疫沉淀磷酸化蛋白质组学方法研究 Jurkat T 细胞中的 cAMP 依赖性激酶信号转导。

Mol Cell Proteomics. 2013 Nov;12(11):3350-9. doi: 10.1074/mcp.O113.028456. Epub 2013 Jul 23.

Empirical pathway analysis, without permutation.经验性路径分析，无需置换。

Biostatistics. 2013 Jul;14(3):573-85. doi: 10.1093/biostatistics/kxt004. Epub 2013 Feb 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于知识的T2统计量，用于对定量蛋白质组学数据进行通路分析。

A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献