Yu Binbing
Laboratory of Epidemiology, Demography and Biometry, National Institute on Aging, Bethesda, Maryland 20892, U.S.A.
J Appl Stat. 2009 Jul 7;36(7):769-778. doi: 10.1080/0266476YYxxxxxxxx.
In disease screening and diagnosis, often multiple markers are measured and they are combined in order to improve the accuracy of diagnosis. McIntosh and Pepe (2002, Biometrics58, 657-644) showed that the risk score, defined as the probability of disease conditional on multiple markers, is the optimal function for classification based on the Neyman-Pearson Lemma. They proposed a two-step procedure to approximate the risk score. However, the resulted ROC curve is only defined in a subrange (L, h) of the false-positive rates in (0,1) and determination of the lower limit L needs extra prior information. In practice, most diagnostic tests are not perfect and it is usually rare that a single marker is uniformly better than the other tests. Using simulation, I show that multivariate adaptive regression spline (MARS) is a useful tool to approximate the risk score when combining multiple markers, especially when the ROC curves from multiple tests cross. The resulted ROC is defined in the whole range of (0,1) and it is easy to implement and has intuitive interpretation. The sample code of the application is shown in the appendix.
在疾病筛查和诊断中,通常会测量多个标志物并将它们结合起来以提高诊断的准确性。麦金托什和佩佩(2002年,《生物统计学》58卷,657 - 644页)表明,风险评分定义为基于多个标志物的疾病条件概率,根据奈曼 - 皮尔逊引理,它是用于分类的最优函数。他们提出了一种两步法来近似风险评分。然而,所得的ROC曲线仅在(0,1)中的假阳性率的一个子范围(L,h)内定义,并且下限L的确定需要额外的先验信息。在实际中,大多数诊断测试并非完美,而且通常很少有单个标志物始终优于其他测试。通过模拟,我表明多元自适应回归样条(MARS)是结合多个标志物时近似风险评分的有用工具,特别是当多个测试的ROC曲线交叉时。所得的ROC在(0,1)的整个范围内定义,并且易于实现且具有直观的解释。应用的示例代码见附录。