Laboratoire des Sciences de l'Environnement Marin, LEMAR UMR 6539 CNRS/UBO/IRD/Ifremer, Institut Universitaire Européen de la Mer, Université de Bretagne Occidentale, 29280 Plouzané, France.
Bioinformatics. 2013 Nov 1;29(21):2729-34. doi: 10.1093/bioinformatics/btt464. Epub 2013 Aug 27.
Two-dimensional electrophoresis is a crucial method in proteomics that allows the characterization of proteins' function and expression. This usually implies the identification of proteins that are differentially expressed between two contrasting conditions, for example, healthy versus diseased in human proteomics biomarker discovery and stressful conditions versus control in animal experimentation. The statistical procedures that lead to such identifications are critical steps in the 2-DE analysis workflow. They include a normalization step and a test and probability correction for multiple testing. Statistical issues caused by the high dimensionality of the data and large-scale multiple testing have been a more active topic in transcriptomics than proteomics, especially in microarray analysis. We thus propose to adapt innovative statistical tools developed for microarray analysis and incorporate them in the 2-DE analysis pipeline.
In this article, we evaluate the performance of different normalization procedures, different statistical tests and false discovery rate calculation methods with both real and simulated datasets. We demonstrate that the use of statistical procedures adapted from microarrays lead to notable increase in power as well as a minimization of false-positive discovery rate. More specifically, we obtained the best results in terms of reliability and sensibility when using the 'moderate t-test' from Smyth in association with classic false discovery rate from Benjamini and Hochberg.
The methods discussed are freely available in the 'prot2D' open source R-package from Bioconductor (http://www.bioconductor.org//) under the terms of the GNU General Public License (version 2 or later).
sebastien.artigaud@univ-brest.fr or sebastien.artigaud@gmx.com.
二维电泳是蛋白质组学中的一种关键方法,可用于研究蛋白质的功能和表达。这通常意味着要识别两种对比条件下差异表达的蛋白质,例如人类蛋白质组学生物标志物发现中的健康与疾病,以及动物实验中的应激条件与对照。导致这些识别的统计程序是 2-DE 分析工作流程中的关键步骤。它们包括标准化步骤以及用于多次测试的测试和概率校正。由于数据的高维性和大规模多次测试引起的统计问题在转录组学中比蛋白质组学更为活跃,特别是在微阵列分析中。因此,我们建议采用为微阵列分析开发的创新统计工具,并将其纳入 2-DE 分析流程。
在本文中,我们使用真实和模拟数据集评估了不同归一化程序、不同统计检验和错误发现率计算方法的性能。我们证明,使用源自微阵列的统计程序可显著提高功效,并最小化假阳性发现率。更具体地说,当与经典的 Benjamini 和 Hochberg 错误发现率一起使用 Smyth 的“中度 t 检验”时,我们在可靠性和敏感性方面获得了最佳结果。
讨论的方法可根据 GNU 通用公共许可证(版本 2 或更高版本)的规定,在 Bioconductor 中的“prot2D”开源 R 包(http://www.bioconductor.org//)中免费使用。
sebastien.artigaud@univ-brest.fr 或 sebastien.artigaud@gmx.com。