Roxas Bryan A P, Li Qingbo
Center for Pharmaceutical Biotechnology, University of Illinois at Chicago, Chicago, IL 60607, USA.
BMC Bioinformatics. 2008 Apr 10;9:187. doi: 10.1186/1471-2105-9-187.
Although fold change is a commonly used criterion in quantitative proteomics for differentiating regulated proteins, it does not provide an estimation of false positive and false negative rates that is often desirable in a large-scale quantitative proteomic analysis. We explore the possibility of applying the Significance Analysis of Microarray (SAM) method (PNAS 98:5116-5121) to a differential proteomics problem of two samples with replicates. The quantitative proteomic analysis was carried out with nanoliquid chromatography/linear iron trap-Fourier transform mass spectrometry. The biological sample model included two Mycobacterium smegmatis unlabeled cell cultures grown at pH 5 and pH 7. The objective was to compare the protein relative abundance between the two unlabeled cell cultures, with an emphasis on significance analysis of protein differential expression using the SAM method. Results using the SAM method are compared with those obtained by fold change and the conventional t-test.
We have applied the SAM method to solve the two-sample significance analysis problem in liquid chromatography/mass spectrometry (LC/MS) based quantitative proteomics. We grew the pH5 and pH7 unlabelled cell cultures in triplicate resulting in 6 biological replicates. Each biological replicate was mixed with a common 15N-labeled reference culture cells for normalization prior to SDS/PAGE fractionation and LC/MS analysis. For each biological replicate, one center SDS/PAGE gel fraction was selected for triplicate LC/MS analysis. There were 121 proteins quantified in at least 5 of the 6 biological replicates. Of these 121 proteins, 106 were significant in differential expression by the t-test (p < 0.05) based on peptide-level replicates, 54 were significant in differential expression by SAM with Delta = 0.68 cutoff and false positive rate at 5%, and 29 were significant in differential expression by the t-test (p < 0.05) based on protein-level replicates. The results indicate that SAM appears to overcome the false positives one encounters using the peptide-based t-test while allowing for identification of a greater number of differentially expressed proteins than the protein-based t-test.
We demonstrate that the SAM method can be adapted for effective significance analysis of proteomic data. It provides much richer information about the protein differential expression profiles and is particularly useful in the estimation of false discovery rates and miss rates.
虽然倍数变化是定量蛋白质组学中用于区分受调控蛋白质的常用标准,但它并未提供大规模定量蛋白质组学分析中通常所需的假阳性率和假阴性率估计。我们探讨了将微阵列显著性分析(SAM)方法(《美国国家科学院院刊》98:5116 - 5121)应用于两个有重复样本的差异蛋白质组学问题的可能性。定量蛋白质组学分析采用纳升液相色谱/线性离子阱 - 傅里叶变换质谱法进行。生物样本模型包括在pH 5和pH 7条件下生长的两种耻垢分枝杆菌未标记细胞培养物。目的是比较两种未标记细胞培养物之间的蛋白质相对丰度,重点是使用SAM方法对蛋白质差异表达进行显著性分析。将使用SAM方法的结果与通过倍数变化和传统t检验获得的结果进行比较。
我们已将SAM方法应用于基于液相色谱/质谱(LC/MS)的定量蛋白质组学中的双样本显著性分析问题。我们将pH5和pH7的未标记细胞培养物进行了三次重复培养,得到6个生物学重复样本。在进行SDS/PAGE分级分离和LC/MS分析之前,每个生物学重复样本都与一种通用的15N标记参考培养细胞混合用于归一化。对于每个生物学重复样本,选择一个中心SDS/PAGE凝胶条带进行三次LC/MS分析。在6个生物学重复样本中的至少5个中对121种蛋白质进行了定量。在这121种蛋白质中,基于肽水平重复,106种通过t检验(p < 0.05)在差异表达上具有显著性;在Delta = 0.68截止值且假阳性率为5%的情况下,54种通过SAM在差异表达上具有显著性;基于蛋白质水平重复,29种通过t检验(p < 0.05)在差异表达上具有显著性。结果表明,SAM似乎克服了使用基于肽的t检验时遇到的假阳性问题,同时比基于蛋白质的t检验能鉴定出更多差异表达的蛋白质。
我们证明SAM方法可适用于蛋白质组学数据的有效显著性分析。它提供了关于蛋白质差异表达谱的更丰富信息,在错误发现率和遗漏率估计方面特别有用。