Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, DHHS, 6100 Executive Boulevard, Bethesda, MD 20852, USA.
Acad Radiol. 2013 Jul;20(7):838-46. doi: 10.1016/j.acra.2013.04.001.
Biomarkers are of ever-increasing importance to clinical practice and epidemiologic research. Multiple biomarkers are often measured per patient. Measurement of true biomarker levels is limited by laboratory precision, specifically measuring relatively low, or high, biomarker levels resulting in undetectable levels below, or above, a limit of detection (LOD). Ignoring these missing observations or replacing them with a constant are methods commonly used although they have been shown to lead to biased estimates of several parameters of interest, including the area under the receiver operating characteristic (ROC) curve and regression coefficients.
We developed asymptotically consistent, efficient estimators, via maximum likelihood techniques, for the mean vector and covariance matrix of multivariate normally distributed biomarkers affected by LOD. We also developed an approximation for the Fisher information and covariance matrix for our maximum likelihood estimations (MLEs). We apply these results to an ROC curve setting, generating an MLE for the area under the curve for the best linear combination of multiple biomarkers and accompanying confidence interval.
Point and confidence interval estimates are scrutinized by simulation study, with bias and root mean square error and coverage probability, respectively, displaying behavior consistent with MLEs. An example using three polychlorinated biphenyls to classify women with and without endometriosis illustrates how the underlying distribution of multiple biomarkers with LOD can be assessed and display increased discriminatory ability over naïve methods.
Properly addressing LODs can lead to optimal biomarker combinations with increased discriminatory ability that may have been ignored because of measurement obstacles.
生物标志物对于临床实践和流行病学研究的重要性日益增加。通常每个患者都要测量多个生物标志物。由于实验室精度的限制,即测量相对较低或较高的生物标志物水平会导致低于或高于检测限(LOD)的水平无法检测,因此对真实生物标志物水平的测量受到限制。尽管已经证明忽略这些缺失观测值或用常数替换它们会导致对几个感兴趣参数的有偏估计,包括接收者操作特征(ROC)曲线下的面积和回归系数,但这些方法仍被广泛使用。
我们通过最大似然技术,为受 LOD 影响的多元正态分布生物标志物的均值向量和协方差矩阵开发了渐近一致、有效的估计量。我们还为最大似然估计(MLE)开发了 Fisher 信息和协方差矩阵的近似值。我们将这些结果应用于 ROC 曲线设置,为多个生物标志物最佳线性组合的曲线下面积生成 MLE 及其伴随的置信区间。
通过模拟研究仔细检查了点估计值和置信区间估计值,分别为偏差、均方根误差和覆盖率概率,其表现与 MLE 一致。使用三个多氯联苯来区分患有和不患有子宫内膜异位症的女性的示例说明了如何评估具有 LOD 的多个生物标志物的基础分布,并显示出比盲目方法更高的区分能力。
正确处理 LOD 可以导致具有更高区分能力的最佳生物标志物组合,这些组合可能因为测量障碍而被忽略。