Department of Statistics, George Mason University, Fairfax, VA 22030, U.S.A.
Stat Med. 2014 Apr 15;33(8):1307-20. doi: 10.1002/sim.6027. Epub 2013 Oct 17.
In this paper, we consider the combination of markers with and without the limit of detection (LOD). LOD is often encountered when measuring proteomic markers. Because of the limited detecting ability of an equipment or instrument, it is difficult to measure markers at a relatively low level. Suppose that after some monotonic transformation, the marker values approximately follow multivariate normal distributions. We propose to estimate distribution parameters while taking the LOD into account, and then combine markers using the results from the linear discriminant analysis. Our simulation results show that the ROC curve parameter estimates generated from the proposed method are much closer to the truth than simply using the linear discriminant analysis to combine markers without considering the LOD. In addition, we propose a procedure to select and combine a subset of markers when many candidate markers are available. The procedure based on the correlation among markers is different from a common understanding that a subset of the most accurate markers should be selected for the combination. The simulation studies show that the accuracy of a combined marker can be largely impacted by the correlation of marker measurements. Our methods are applied to a protein pathway dataset to combine proteomic biomarkers to distinguish cancer patients from non-cancer patients.
在本文中,我们考虑了具有和不具有检测限(LOD)的标记物的组合。在测量蛋白质组学标记物时经常会遇到 LOD。由于设备或仪器的检测能力有限,因此很难测量相对较低水平的标记物。假设经过一些单调变换后,标记值大致遵循多元正态分布。我们建议在考虑 LOD 的情况下估计分布参数,然后使用线性判别分析的结果来组合标记物。我们的仿真结果表明,与不考虑 LOD 而简单地使用线性判别分析来组合标记物相比,从所提出的方法生成的 ROC 曲线参数估计值更接近真实值。此外,当有许多候选标记物时,我们提出了一种选择和组合标记子集的程序。该程序基于标记物之间的相关性,与通常的理解不同,即应选择最准确的标记子集进行组合。仿真研究表明,组合标记物的准确性可能会受到标记物测量相关性的很大影响。我们的方法应用于蛋白质途径数据集,以结合蛋白质组生物标志物来区分癌症患者和非癌症患者。