Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, 3010, Switzerland.
Department for BioMedical Research, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
BMC Bioinformatics. 2019 Nov 9;20(1):563. doi: 10.1186/s12859-019-3144-3.
Data from discovery proteomic and phosphoproteomic experiments typically include missing values that correspond to proteins that have not been identified in the analyzed sample. Replacing the missing values with random numbers, a process known as "imputation", avoids apparent infinite fold-change values. However, the procedure comes at a cost: Imputing a large number of missing values has the potential to significantly impact the results of the subsequent differential expression analysis.
We propose a method that identifies differentially expressed proteins by ranking their observed changes with respect to the changes observed for other proteins. Missing values are taken into account by this method directly, without the need to impute them. We illustrate the performance of the new method on two distinct datasets and show that it is robust to missing values and, at the same time, provides results that are otherwise similar to those obtained with edgeR which is a state-of-art differential expression analysis method.
The new method for the differential expression analysis of proteomic data is available as an easy to use Python package.
发现蛋白质组学和磷酸化蛋白质组学实验的数据通常包含缺失值,这些缺失值对应于在分析样本中未被识别的蛋白质。用随机数替换缺失值的过程称为“插补”,可以避免明显的无穷倍变化值。然而,该过程是有代价的:插补大量缺失值有可能显著影响后续差异表达分析的结果。
我们提出了一种通过对观察到的变化相对于其他蛋白质观察到的变化进行排序来识别差异表达蛋白的方法。该方法直接考虑缺失值,而无需对其进行插补。我们在两个不同的数据集上说明了新方法的性能,并表明它对缺失值具有鲁棒性,同时提供的结果与差异表达分析的一种先进方法 edgeR 获得的结果相似。
用于蛋白质组学数据差异表达分析的新方法可作为一个易于使用的 Python 包使用。