Haas School of Business, University of California, Berkeley, Berkeley, California, USA.
The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine, Hanover, New Hampshire, USA.
Stat Med. 2022 Aug 30;41(19):3772-3788. doi: 10.1002/sim.9448. Epub 2022 Jun 8.
The difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in survival estimation. Leveraging set-valued classification and split conformal inference, we show how a fixed algorithm developed in one cohort of data may be deployed in another, while rigorously accounting for uncertainty from the initial classification step. We demonstrate this process using SEER cancer registry data linked with Medicare claims data.
在医疗保健索赔数据中确定癌症分期的困难限制了肿瘤学的护理质量和健康结果研究。我们使用索赔数据为肺癌分期的分类构建预测算法,分为三个等级(I/II 期、III 期和 IV 期),然后展示了一种在生存估计中纳入分类不确定性的方法。利用集值分类和分割一致推断,我们展示了如何在严格考虑初始分类步骤不确定性的情况下,将在一个数据队列中开发的固定算法应用于另一个队列。我们使用与医疗保险索赔数据相关联的 SEER 癌症登记数据来演示这一过程。