Suppr超能文献

基于人群的登记处中使用机器学习方法识别膀胱癌漏诊病例。

Machine Learning Methods to Identify Missed Cases of Bladder Cancer in Population-Based Registries.

机构信息

Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.

University of North Carolina Lineberger Comprehensive Cancer Center, Chapel Hill, NC.

出版信息

JCO Clin Cancer Inform. 2021 Jun;5:641-653. doi: 10.1200/CCI.20.00170.

Abstract

PURPOSE

Population-based cancer incidence rates of bladder cancer may be underestimated. Accurate estimates are needed for understanding the burden of bladder cancer in the United States. We developed and evaluated the feasibility of a machine learning-based classifier to identify bladder cancer cases missed by cancer registries, and estimated the rate of bladder cancer cases potentially missed.

METHODS

Data were from population-based cohort of 37,940 bladder cancer cases 65 years of age and older in the SEER cancer registries linked with Medicare claims (2007-2013). Cases with other urologic cancers, abdominal cancers, and unrelated cancers were included as control groups. A cohort of cancer-free controls was also selected using the Medicare 5% random sample. We used five supervised machine learning methods: classification and regression trees, random forest, logic regression, support vector machines, and logistic regression, for predicting bladder cancer.

RESULTS

Registry linkages yielded 37,940 bladder cancer cases and 766,303 cancer-free controls. Using health insurance claims, classification and regression trees distinguished bladder cancer cases from noncancer controls with very high accuracy (95%). Bacille Calmette-Guerin, cystectomy, and mitomycin were the most important predictors for identifying bladder cancer. From 2007 to 2013, we estimated that up to 3,300 bladder cancer cases in the United States may have been missed by the SEER registries. This would result in an average of 3.5% increase in the reported incidence rate.

CONCLUSION

SEER cancer registries may potentially miss bladder cancer cases during routine reporting. These missed cases can be identified leveraging Medicare claims and data analytics, leading to more accurate estimates of bladder cancer incidence.

摘要

目的

基于人群的膀胱癌发病率可能被低估。为了了解美国膀胱癌的负担,需要准确的估计。我们开发并评估了一种基于机器学习的分类器的可行性,以识别癌症登记处遗漏的膀胱癌病例,并估计潜在遗漏的膀胱癌病例的发生率。

方法

数据来自 SEER 癌症登记处的基于人群的队列,该队列包括 37940 名 65 岁及以上的膀胱癌病例,与医疗保险索赔(2007-2013 年)相关联。患有其他泌尿科癌症、腹部癌症和无关癌症的病例被纳入对照组。还使用医疗保险 5%随机样本选择了一组无癌症对照。我们使用了五种监督机器学习方法:分类和回归树、随机森林、逻辑回归、支持向量机和逻辑回归,用于预测膀胱癌。

结果

登记处的链接产生了 37940 例膀胱癌病例和 766303 例无癌症对照。使用健康保险索赔,分类和回归树以非常高的准确性(95%)区分膀胱癌病例和非癌症对照。卡介苗、膀胱切除术和丝裂霉素是识别膀胱癌的最重要预测因素。2007 年至 2013 年,我们估计美国可能有多达 3300 例膀胱癌病例被 SEER 登记处遗漏。这将导致报告发病率平均增加 3.5%。

结论

SEER 癌症登记处在常规报告期间可能会遗漏膀胱癌病例。这些遗漏的病例可以通过利用医疗保险索赔和数据分析来识别,从而更准确地估计膀胱癌的发病率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验