Suppr超能文献

基于医疗保健理赔记录对IV期肺癌进行分类:多种分析方法的比较

Classifying Stage IV Lung Cancer From Health Care Claims: A Comparison of Multiple Analytic Approaches.

作者信息

Brooks Gabriel A, Bergquist Savannah L, Landrum Mary Beth, Rose Sherri, Keating Nancy L

机构信息

Geisel School of Medicine, Lebanon, NH.

Harvard University, Cambridge, MA.

出版信息

JCO Clin Cancer Inform. 2019 May;3:1-19. doi: 10.1200/CCI.18.00156.

Abstract

PURPOSE

Cancer stage is a key determinant of outcomes; however, stage is not available in claims-based data sources used for real-world evaluations. We compare multiple methods for classifying lung cancer stage from claims data.

METHODS

Our study used the linked SEER-Medicare data. The patient samples included fee-for-service Medicare beneficiaries diagnosed with lung cancer from 2010 to 2011 (development cohort) and 2012 to 2013 (validation cohort) who received chemotherapy. Classification algorithms considered Medicare Part A and B claims for care in the 3 months before and after chemotherapy initiation. We developed a clinical algorithm to predict stage IV ( I to III) cancer on the basis of treatment patterns (surgery, radiotherapy, chemotherapy). We also considered an ensemble of claims-based machine learning algorithms. Classification methods were trained in the development cohort, and performance was measured in both cohorts. The SEER data were the gold standard for cancer stage.

RESULTS

Development and validation cohorts included 14,760 and 14,620 patients with lung cancer, respectively. Validation analyses assessed clinical, random forest, and simple logistic regression algorithms. The best performing classifier within the development cohort was the random forests, but this performance was not replicated in validation analysis. Logistic regression had stable performance across cohorts. Compared with the clinical algorithm, the 14-variable logistic regression algorithm demonstrated higher accuracy in both the development (77% 71%) and validation cohorts (77% 73%), with improved specificity for stage IV disease.

CONCLUSION

Machine learning algorithms have potential to improve lung cancer stage classification but may be prone to overfitting. Use of ensembles, cross-validation, and external validation can aid generalizability. Degradation of accuracy between development and validation cohorts suggests the need for caution in implementing machine learning in research or care delivery.

摘要

目的

癌症分期是预后的关键决定因素;然而,在用于真实世界评估的基于索赔的数据源中无法获取分期信息。我们比较了多种从索赔数据中对肺癌分期进行分类的方法。

方法

我们的研究使用了链接的监测、流行病学和最终结果(SEER)医保数据。患者样本包括2010年至2011年(开发队列)以及2012年至2013年(验证队列)被诊断为肺癌且接受化疗的按服务收费的医保受益人。分类算法考虑了化疗开始前3个月和后3个月医保A部分和B部分的护理索赔。我们基于治疗模式(手术、放疗、化疗)开发了一种临床算法来预测IV期(I至III期)癌症。我们还考虑了一组基于索赔的机器学习算法。分类方法在开发队列中进行训练,并在两个队列中测量性能表现。SEER数据是癌症分期的金标准。

结果

开发队列和验证队列分别包括14760例和14620例肺癌患者。验证分析评估了临床、随机森林和简单逻辑回归算法。开发队列中表现最佳的分类器是随机森林,但这种性能在验证分析中未得到重现。逻辑回归在各队列中表现稳定。与临床算法相比,14变量逻辑回归算法在开发队列(77%对71%)和验证队列(77%对73%)中均显示出更高的准确性,对IV期疾病的特异性有所提高。

结论

机器学习算法有潜力改善肺癌分期分类,但可能容易出现过拟合。使用集成方法、交叉验证和外部验证有助于提高可推广性。开发队列和验证队列之间准确性的下降表明在研究或医疗服务中实施机器学习时需要谨慎。

相似文献

引用本文的文献

本文引用的文献

6
Pembrolizumab for the treatment of non-small-cell lung cancer.帕博利珠单抗治疗非小细胞肺癌。
N Engl J Med. 2015 May 21;372(21):2018-28. doi: 10.1056/NEJMoa1501824. Epub 2015 Apr 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验