Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843, USA.
Bioinformatics. 2012 Jun 15;28(12):1586-91. doi: 10.1093/bioinformatics/bts193. Epub 2012 Apr 19.
Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical MS-based proteomics datasets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis.
We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of 'presence/absence,' we enable the selection of proteins not typically amenable to quantitative analysis; e.g. 'one-state' proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence/absence analysis of a given dataset in a principled way, resulting in a single list of selected proteins with a single-associated false discovery rate.
All R code available here: http://www.stat.tamu.edu/~adabney/share/xuan_code.zip.
基于定量质谱的蛋白质组学涉及对蛋白质丰度进行统计推断,其依据是每个蛋白质相关谱峰的强度。然而,典型的基于 MS 的蛋白质组学数据集有相当大比例的缺失观测值,这至少部分是由于低强度的删失。这使得基于强度的差异表达分析变得复杂。
我们概述了一种基于简单二项式似然的蛋白质差异表达的统计方法。通过将峰强度建模为“存在/不存在”的二进制形式,我们可以选择通常不适于定量分析的蛋白质;例如,“单状态”蛋白质在一种条件下存在而在另一种条件下不存在。此外,我们提出了一种分析协议,该协议以一种有原则的方式将给定数据集的定量和存在/不存在分析结合起来,从而得到一份具有单一相关假发现率的选定蛋白质列表。
所有的 R 代码都可以在这里获取:http://www.stat.tamu.edu/~adabney/share/xuan_code.zip。