Suppr超能文献

基于真实世界临床数据流的胃癌风险预测中逻辑回归与机器学习算法的比较。

A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams.

机构信息

Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA.

Division of Gastroenterology, University of Washington, Seattle, WA.

出版信息

JCO Clin Cancer Inform. 2022 Jun;6:e2200039. doi: 10.1200/CCI.22.00039.

Abstract

PURPOSE

Noncardia gastric cancer (NCGC) is a leading cause of global cancer mortality, and is often diagnosed at advanced stages. Development of NCGC risk models within electronic health records (EHR) may allow for improved cancer prevention. There has been much recent interest in use of machine learning (ML) for cancer prediction, but few studies comparing ML with classical statistical models for NCGC risk prediction.

METHODS

We trained models using logistic regression (LR) and four commonly used ML algorithms to predict NCGC from age-/sex-matched controls in two EHR systems: Stanford University and the University of Washington (UW). The LR model contained well-established NCGC risk factors (intestinal metaplasia histology, prior infection, race, ethnicity, nativity status, smoking history, anemia), whereas ML models agnostically selected variables from the EHR. Models were developed and internally validated in the Stanford data, and externally validated in the UW data. Hyperparameter tuning of models was achieved using cross-validation. Model performance was compared by accuracy, sensitivity, and specificity.

RESULTS

In internal validation, LR performed with comparable accuracy (0.732; 95% CI, 0.698 to 0.764), sensitivity (0.697; 95% CI, 0.647 to 0.744), and specificity (0.767; 95% CI, 0.720 to 0.809) to penalized lasso, support vector machine, K-nearest neighbor, and random forest models. In external validation, LR continued to demonstrate high accuracy, sensitivity, and specificity. Although K-nearest neighbor demonstrated higher accuracy and specificity, this was offset by significantly lower sensitivity. No ML model consistently outperformed LR across evaluation criteria.

CONCLUSION

Drawing data from two independent EHRs, we find LR on the basis of established risk factors demonstrated comparable performance to optimized ML algorithms. This study demonstrates that classical models built on robust, hand-chosen predictor variables may not be inferior to data-driven models for NCGC risk prediction.

摘要

目的

非贲门胃癌(NCGC)是全球癌症死亡的主要原因,且通常在晚期诊断。在电子健康记录(EHR)中开发 NCGC 风险模型可能有助于改善癌症预防。最近,人们对使用机器学习(ML)进行癌症预测产生了浓厚的兴趣,但很少有研究将 ML 与用于 NCGC 风险预测的经典统计模型进行比较。

方法

我们使用逻辑回归(LR)和四种常用的 ML 算法在斯坦福大学和华盛顿大学(UW)的两个 EHR 系统中从年龄/性别匹配的对照中训练预测 NCGC 的模型。LR 模型包含已确立的 NCGC 风险因素(肠化生组织学、既往感染、种族、民族、原籍国状况、吸烟史、贫血),而 ML 模型则从 EHR 中盲目选择变量。在斯坦福大学的数据中开发和内部验证模型,并在 UW 数据中进行外部验证。使用交叉验证来调整模型的超参数。通过准确性、敏感性和特异性来比较模型的性能。

结果

在内部验证中,LR 的准确性(0.732;95%CI,0.698 至 0.764)、敏感性(0.697;95%CI,0.647 至 0.744)和特异性(0.767;95%CI,0.720 至 0.809)与惩罚型套索、支持向量机、K-最近邻和随机森林模型相当。在外部验证中,LR 继续表现出高准确性、敏感性和特异性。虽然 K-最近邻的准确性和特异性更高,但敏感性明显较低。在评估标准方面,没有一种 ML 模型始终优于 LR。

结论

从两个独立的 EHR 中提取数据,我们发现基于已确立的风险因素的 LR 与优化的 ML 算法具有相当的性能。本研究表明,基于稳健、人工选择的预测变量构建的经典模型在 NCGC 风险预测方面可能并不逊于基于数据的模型。

相似文献

引用本文的文献

5

本文引用的文献

2
An Approach to the Primary and Secondary Prevention of Gastric Cancer in the United States.美国胃癌的一级和二级预防方法。
Clin Gastroenterol Hepatol. 2022 Oct;20(10):2218-2228.e2. doi: 10.1016/j.cgh.2021.09.039. Epub 2021 Oct 6.
8
A Summary of the 2020 Gastric Cancer Summit at Stanford University.斯坦福大学 2020 年胃癌峰会纪要。
Gastroenterology. 2020 Oct;159(4):1221-1226. doi: 10.1053/j.gastro.2020.05.100. Epub 2020 Jul 21.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验