The Dartmouth Institute for Health Policy and Clinical Practice.
Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH.
Med Care. 2018 Dec;56(12):e83-e89. doi: 10.1097/MLR.0000000000000875.
In an effort to overcome quality and cost constraints inherent in population-based research, diverse data sources are increasingly being combined. In this paper, we describe the performance of a Medicare claims-based incident cancer identification algorithm in comparison with observational cohort data from the Nurses' Health Study (NHS).
NHS-Medicare linked participants' claims data were analyzed using 4 versions of a cancer identification algorithm across 3 cancer sites (breast, colorectal, and lung). The algorithms evaluated included an update of the original Setoguchi algorithm, and 3 other versions that differed in the data used for prevalent cancer exclusions.
The algorithm that yielded the highest positive predictive value (PPV) (0.52-0.82) and κ statistic (0.62-0.87) in identifying incident cancer cases utilized both Medicare claims and observational cohort data (NHS) to remove prevalent cases. The algorithm that only used NHS data to inform the removal of prevalent cancer cases performed nearly equivalently in statistical performance (PPV, 0.50-0.79; κ, 0.61-0.85), whereas the version that used only claims to inform the removal of prevalent cancer cases performed substantially worse (PPV, 0.42-0.60; κ, 0.54-0.70), in comparison with the dual data source-informed algorithm.
Our findings suggest claims-based algorithms identify incident cancer with variable reliability when measured against an observational cohort study reference standard. Self-reported baseline information available in cohort studies is more effective in removing prevalent cancer cases than are claims data algorithms. Use of claims-based algorithms should be tailored to the research question at hand and the nature of available observational cohort data.
为了克服基于人群的研究中固有的质量和成本限制,越来越多的不同数据源正在被整合。在本文中,我们描述了一种基于医疗保险索赔的癌症识别算法的性能,该算法与护士健康研究(NHS)的观察队列数据进行了比较。
使用 4 种癌症识别算法版本(乳腺癌、结直肠癌和肺癌)对 NHS-医疗保险链接参与者的索赔数据进行了分析。评估的算法包括原始 Setoguchi 算法的更新版本,以及另外 3 种在用于排除现有癌症的数据集方面存在差异的版本。
在识别新发癌症病例方面,产生最高阳性预测值(PPV)(0.52-0.82)和κ统计量(0.62-0.87)的算法利用了医疗保险索赔数据和观察队列数据(NHS)来排除现有病例。仅使用 NHS 数据来告知排除现有癌症病例的算法在统计性能方面表现相当(PPV,0.50-0.79;κ,0.61-0.85),而仅使用索赔数据来告知排除现有癌症病例的算法的性能则明显较差(PPV,0.42-0.60;κ,0.54-0.70),与双数据源告知的算法相比。
我们的研究结果表明,与观察性队列研究参考标准相比,基于索赔的算法在识别新发癌症方面的可靠性存在差异。队列研究中可用的基于自我报告的基线信息比索赔数据算法更有效地排除现有癌症病例。基于索赔的算法的使用应根据手头的研究问题和可用观察性队列数据的性质进行调整。