Department of Statistics, Purdue University, West Lafayette, Indiana 47907, USA.
J Proteome Res. 2009 Nov;8(11):5275-84. doi: 10.1021/pr900610q.
The goal of many LC-MS proteomic investigations is to quantify and compare the abundance of proteins in complex biological mixtures. However, the output of an LC-MS experiment is not a list of proteins, but a list of quantified spectral features. To make protein-level conclusions, researchers typically apply ad hoc rules, or take an average of feature abundance to obtain a single protein-level quantity for each sample. We argue that these two approaches are inadequate. We discuss two statistical models, namely, fixed and mixed effects Analysis of Variance (ANOVA), which views individual features as replicate measurements of a protein's abundance, and explicitly account for this redundancy. We demonstrate, using a spike-in and a clinical data set, that the proposed models improve the sensitivity and specificity of testing, improve the accuracy of patient-specific protein quantifications, and are more robust in the presence of missing data.
许多 LC-MS 蛋白质组学研究的目标是定量和比较复杂生物混合物中蛋白质的丰度。然而,LC-MS 实验的输出不是蛋白质列表,而是定量的光谱特征列表。为了得出蛋白质水平的结论,研究人员通常采用特定的规则,或者取特征丰度的平均值,为每个样本获得一个单一的蛋白质水平的数量。我们认为这两种方法都不充分。我们讨论了两种统计模型,即固定和混合效应方差分析 (ANOVA),它们将单个特征视为蛋白质丰度的重复测量,并明确考虑到这种冗余性。我们使用掺入和临床数据集证明,所提出的模型提高了检测的灵敏度和特异性,提高了患者特定蛋白质定量的准确性,并且在存在缺失数据时更稳健。