Institut de Recherche Mathématique Avancée, UMR 7501, CNRS-Université de Strasbourg, Strasbourg, France.
Laboratoire de Spectrométrie de Masse Bio-Organique, Institut Pluridisciplinaire Hubert Curien, UMR 7178, CNRS-Université de Strasbourg, Strasbourg, France.
PLoS Comput Biol. 2022 Aug 29;18(8):e1010420. doi: 10.1371/journal.pcbi.1010420. eCollection 2022 Aug.
Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.
在无标记定量蛋白质组学中,缺失值插补是一种常见的做法。插补的目的是用用户定义的值替换缺失值。然而,在插补过程的下游,插补本身可能没有被最优地考虑,因为插补数据集通常被认为是完整的。因此,由于插补而产生的不确定性没有被充分考虑。我们提供了一种严格的多重插补策略,由于鲁宾的规则,这使得对参数变异性的估计更加无偏。然后,使用贝叶斯层次模型来调节基于插补的肽强度方差估计量。最后,该估计量被包含在经过调节的 t 检验统计中,以提供差异分析结果。这个工作流程可以在定量数据集的肽和蛋白质水平上使用。实际上,基于肽水平的定量数据,为蛋白质水平的结果包含了一个聚合步骤。我们的方法,名为 mi4p,与 DAPAR R 包中实现的最先进的 limma 工作流程在模拟和真实数据集上进行了比较。我们观察到了敏感性和特异性之间的权衡,而 mi4p 的整体性能在 F-Score 方面优于 DAPAR。