Suppr超能文献

一种基于知识的T2统计量,用于对定量蛋白质组学数据进行通路分析。

A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

作者信息

Lai En-Yu, Chen Yi-Hau, Wu Kun-Pin

机构信息

Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan.

Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 11529, Taiwan.

出版信息

PLoS Comput Biol. 2017 Jun 16;13(6):e1005601. doi: 10.1371/journal.pcbi.1005601. eCollection 2017 Jun.

Abstract

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.

摘要

近年来已开发出从高通量定量数据中识别重要通路的方法。然而,由于样本量有限,蛋白质组学数据的分析仍然困难。这种限制还导致了使用竞争性零假设作为常用方法的做法;这从根本上意味着将基因或蛋白质视为独立单元。独立假设忽略了具有相似功能或细胞定位的生物分子之间的关联,以及它们之间表现为表达比率变化的相互作用。因此,这些方法在实际应用中往往低估了生物分子之间的关联,并导致假阳性。一些研究将样本协方差矩阵纳入计算以解决此问题。然而,如果样本量非常有限,样本协方差可能不是一个精确的估计,而这通常是质谱产生的数据的情况。在本研究中,我们引入了一种在自包含零假设下的多变量检验,用于对定量蛋白质组学数据进行通路分析。检验统计量中使用的协方差矩阵由从STRING数据库或HitPredict数据库检索到的置信分数构建。我们还设计了一个整合程序,以保留有足够证据的通路作为一个通路组。使用五个已发表的实验数据集证明了所提出的T2统计量的性能:T细胞活化、cAMP/PKA信号传导、成肌细胞分化以及达沙替尼对BCR-ABL通路的影响是由质谱产生的蛋白质组学数据集;而肌纤蛋白通过MAPK信号通路的保护作用是一个样本量有限的基因表达数据集。与其他常用统计量相比,所提出的T2统计量产生了更准确的描述,与原始出版物的讨论一致。我们将T2统计量实现为一个R包T2GA,可在https://github.com/roqe/T2GA上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc4/5493430/64fcabaa2bd7/pcbi.1005601.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验