Freeman J L, Zhang D, Freeman D H, Goodwin J S
Division of Geriatric Medicine, Department of Internal Medicine, University of Texas Medical Branch, Galveston, TX 77555, USA.
J Clin Epidemiol. 2000 Jun;53(6):605-14. doi: 10.1016/s0895-4356(99)00173-0.
This study developed and evaluated a method for ascertaining a newly diagnosed breast cancer case using multiple sources of data from the Medicare claims system. Predictors of an incident case were operationally defined as codes for breast cancer-related diagnoses and procedures from hospital inpatient, hospital outpatient, and physician claims. The optimal combination of predictors was then determined from a logistic regression model using 1992 data from the linked SEER registries-Medicare claims data base and a sample of noncancer controls drawn from the SEER areas. While the ROC curve demonstrates that the model can produce levels of sensitivity and specificity above 90%, the positive predictive value is comparatively low (67-70%). This low predictive value is largely the result of the model's limitation in distinguishing recurrent and secondary malignancies from incident cases and possibly from the model identifying true incident cases not identified by SEER. Nevertheless, the logistic regression approach is a useful method for ascertaining incident cases because it allows for greater flexibility in changing the performance characteristics by selecting different cut-points depending on the application (e.g., high sensitivity for registry validation, high specificity for outcomes research). It also allows us to make specific adjustments to population based estimates of breast cancer incidence with claims.
本研究开发并评估了一种利用医疗保险理赔系统的多源数据确定新诊断乳腺癌病例的方法。将发病病例的预测因素在操作上定义为来自医院住院、医院门诊和医生理赔的与乳腺癌相关诊断和程序的编码。然后使用来自链接的监测、流行病学与最终结果(SEER)登记处-医疗保险理赔数据库的1992年数据以及从SEER地区抽取的非癌症对照样本,通过逻辑回归模型确定预测因素的最佳组合。虽然ROC曲线表明该模型能够产生高于90%的灵敏度和特异度水平,但阳性预测值相对较低(67%-70%)。这种低预测值很大程度上是由于该模型在区分复发和继发性恶性肿瘤与发病病例方面存在局限性,也可能是由于该模型识别出了SEER未识别的真正发病病例。尽管如此,逻辑回归方法是确定发病病例的一种有用方法,因为它允许根据应用情况(例如,用于登记验证时高灵敏度,用于结局研究时高特异度)通过选择不同的切点来更灵活地改变性能特征。它还使我们能够对基于理赔的乳腺癌发病率人群估计进行具体调整。