Li Qingbo, Roxas Bryan Ap
Center for Pharmaceutical Biotechnology, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60607, USA.
BMC Bioinformatics. 2009 Feb 2;10:43. doi: 10.1186/1471-2105-10-43.
Many studies have provided algorithms or methods to assess a statistical significance in quantitative proteomics when multiple replicates for a protein sample and a LC/MS analysis are available. But, confidence is still lacking in using datasets for a biological interpretation without protein sample replicates. Although a fold-change is a conventional threshold that can be used when there are no sample replicates, it does not provide an assessment of statistical significance such as a false discovery rate (FDR) which is an important indicator of the reliability to identify differentially expressed proteins. In this work, we investigate whether differentially expressed proteins can be detected with a statistical significance from a pair of unlabeled protein samples without replicates and with only duplicate LC/MS injections per sample. A FDR is used to gauge the statistical significance of the differentially expressed proteins.
We have experimented to operate on several parameters to control a FDR, including a fold-change, a statistical test, and a minimum number of permuted significant pairings. Although none of these parameters alone gives a satisfactory control of a FDR, we find that a combination of these parameters provides a very effective means to control a FDR without compromising the sensitivity. The results suggest that it is possible to perform a significance analysis without protein sample replicates. Only duplicate LC/MS injections per sample are needed. We illustrate that differentially expressed proteins can be detected with a FDR between 0 and 15% at a positive rate of 4-16%. The method is evaluated for its sensitivity and specificity by a ROC analysis, and is further validated with a [15N]-labeled internal-standard protein sample and additional unlabeled protein sample replicates.
We demonstrate that a statistical significance can be inferred without protein sample replicates in label-free quantitative proteomics. The approach described in this study would be useful in many exploratory experiments where a sample amount or instrument time is limited. Naturally, this method is also suitable for proteomics experiments where multiple sample replicates are available. It is simple, and is complementary to other more sophisticated algorithms that are not designed for dealing with a small number of sample replicates.
当蛋白质样品有多个重复且可进行液相色谱/质谱(LC/MS)分析时,许多研究已提供了评估定量蛋白质组学中统计显著性的算法或方法。但是,在没有蛋白质样品重复的情况下使用数据集进行生物学解释时,仍缺乏信心。虽然倍数变化是在没有样品重复时可使用的传统阈值,但它并未提供诸如错误发现率(FDR)等统计显著性的评估,而FDR是鉴定差异表达蛋白质可靠性的重要指标。在这项工作中,我们研究了是否可以从一对无重复且每个样品仅进行两次LC/MS进样的未标记蛋白质样品中检测出具有统计显著性的差异表达蛋白质。使用FDR来衡量差异表达蛋白质的统计显著性。
我们已对几个参数进行了实验操作以控制FDR,包括倍数变化、统计检验和最小置换显著配对数。尽管这些参数单独使用时均不能令人满意地控制FDR,但我们发现这些参数的组合提供了一种非常有效的方法来控制FDR,而不会损害灵敏度。结果表明,在没有蛋白质样品重复的情况下进行显著性分析是可能的。每个样品仅需两次LC/MS进样。我们表明,以4%-16%的阳性率可以检测出FDR在0至15%之间的差异表达蛋白质。通过ROC分析评估了该方法的灵敏度和特异性,并用[15N]标记的内标蛋白质样品和额外的未标记蛋白质样品重复进行了进一步验证。
我们证明了在无标记定量蛋白质组学中无需蛋白质样品重复即可推断出统计显著性。本研究中描述的方法在许多样品量或仪器时间有限的探索性实验中将很有用。当然,该方法也适用于有多个样品重复的蛋白质组学实验。它很简单,并且是其他未设计用于处理少量样品重复的更复杂算法的补充。