处理蛋白质组学数据中的缺失值。

Dealing with missing values in proteomics data.

机构信息

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.

School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.

出版信息

Proteomics. 2022 Dec;22(23-24):e2200092. doi: 10.1002/pmic.202200092. Epub 2022 Nov 17.

DOI:10.1002/pmic.202200092

PMID:36349819

Abstract

Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.

摘要

蛋白质组学数据经常存在缺失值问题。这些缺失值（MVs）通过降低统计能力、引入偏差以及未能代表真实样本，威胁到后续统计分析的完整性。多年来，已经开发并适应了几种缺失值插补（MVI）方法用于蛋白质组学数据。这些 MVI 方法基于不同的先验假设（例如，数据是正态或独立分布的）和操作原理（例如，算法是为仅解决随机缺失而构建的）进行操作，即使处理相同的数据集，其性能也存在差异。因此，为了达到令人满意的结果，必须选择合适的 MVI 方法。为了指导合适的 MVI 方法的决策，我们提供了一个决策图表，便于对呈现不同特征的数据集进行策略考虑。我们还提请注意其他可能影响适当 MVI 的问题，例如混杂因素（例如，批次效应）的存在，这些因素会影响 MVI 的性能。因此，这些因素也应该在 MVI 期间或之前进行考虑。