基于无标记串联质谱蛋白质组学的谱计数数据的双二项式模型分析。

On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics.

机构信息

OncoProteomics Laboratory, Department Medical Oncology, VUmc-Cancer Center Amsterdam, VU University Medical Center, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands.

出版信息

Bioinformatics. 2010 Feb 1;26(3):363-9. doi: 10.1093/bioinformatics/btp677. Epub 2009 Dec 9.

DOI:10.1093/bioinformatics/btp677

PMID:20007255

Abstract

MOTIVATION

Spectral count data generated from label-free tandem mass spectrometry-based proteomic experiments can be used to quantify protein's abundances reliably. Comparing spectral count data from different sample groups such as control and disease is an essential step in statistical analysis for the determination of altered protein level and biomarker discovery. The Fisher's exact test, the G-test, the t-test and the local-pooled-error technique (LPE) are commonly used for differential analysis of spectral count data. However, our initial experiments in two cancer studies show that the current methods are unable to declare at 95% confidence level a number of protein markers that have been judged to be differential on the basis of the biology of the disease and the spectral count numbers. A shortcoming of these tests is that they do not take into account within- and between-sample variations together. Hence, our aim is to improve upon existing techniques by incorporating both the within- and between-sample variations.

RESULT

We propose to use the beta-binomial distribution to test the significance of differential protein abundances expressed in spectral counts in label-free mass spectrometry-based proteomics. The beta-binomial test naturally normalizes for total sample count. Experimental results show that the beta-binomial test performs favorably in comparison with other methods on several datasets in terms of both true detection rate and false positive rate. In addition, it can be applied for experiments with one or more replicates, and for multiple condition comparisons. Finally, we have implemented a software package for parameter estimation of two beta-binomial models and the associated statistical tests.

AVAILABILITY AND IMPLEMENTATION

A software package implemented in R is freely available for download at http://www.oncoproteomics.nl/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基于无标记串联质谱的蛋白质组学实验产生的光谱计数数据可用于可靠地定量蛋白质的丰度。比较对照和疾病等不同样本组的光谱计数数据是统计分析中确定蛋白质水平变化和生物标志物发现的关键步骤。Fisher 精确检验、G 检验、t 检验和局部聚集误差技术（LPE）常用于光谱计数数据的差异分析。然而，我们在两项癌症研究中的初步实验表明，目前的方法无法在 95%置信水平下宣布一些蛋白标志物，这些标志物基于疾病的生物学和光谱计数被判断为差异。这些检验的一个缺点是它们没有同时考虑到样本内和样本间的变化。因此，我们的目标是通过同时考虑样本内和样本间的变化来改进现有的技术。

结果

我们提出使用β-二项式分布来检验无标记质谱蛋白质组学中光谱计数中差异表达的蛋白质丰度的显著性。β-二项式检验自然会对总样本计数进行归一化。实验结果表明，在几个数据集上，β-二项式检验在真阳性率和假阳性率方面均优于其他方法。此外，它可以应用于一个或多个重复的实验，以及多个条件的比较。最后，我们为两个β-二项式模型的参数估计和相关统计检验开发了一个软件包。