1 Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands.
2 Center for Medical Biometry and Medical Informatics, University of Freiburg, Freiburg, Germany.
Stat Methods Med Res. 2018 Sep;27(9):2742-2755. doi: 10.1177/0962280216685742. Epub 2016 Dec 23.
In this paper, we consider the problem of calibrating diagnostic rules based on high-resolution mass spectrometry data subject to the limit of detection. The limit of detection is related to the limitation of instruments in measuring low-concentration proteins. As a consequence, peak intensities below the limit of detection are often reported as missing during the quantification step of proteomic analysis. We propose the use of censored data methodology to handle spectral measurements within the presence of limit of detection, recognizing that those have been left-censored for low-abundance proteins. We replace the set of incomplete spectral measurements with estimates of the expected intensity and use those as input to a prediction model. To correct for lack of information and measurement uncertainty, we combine this approach with borrowing of information through the addition of an individual-specific random effect formulation. We present different modalities of using the above formulation for prediction purposes and show how it may also allow for variable selection. We evaluate the proposed methods by comparing their predictive performance with the one achieved using the complete information as well as alternative methods to deal with the limit of detection.
在本文中,我们考虑了基于高分辨率质谱数据校准诊断规则的问题,这些数据受到检测限的限制。检测限与仪器在测量低浓度蛋白质方面的局限性有关。因此,在蛋白质组学分析的定量步骤中,低于检测限的峰强度通常被报告为缺失。我们提出使用删失数据方法来处理存在检测限时的光谱测量,认识到对于低丰度蛋白质,这些测量已经被左删失。我们用预期强度的估计值替换不完整的光谱测量集,并将其用作预测模型的输入。为了纠正信息缺失和测量不确定性,我们通过添加个体特定的随机效应公式来结合这种方法与信息借用。我们提出了使用上述公式进行预测的不同方式,并展示了它如何也允许进行变量选择。我们通过将预测性能与使用完整信息以及处理检测限的替代方法所达到的性能进行比较,来评估所提出的方法。