Suppr超能文献

用于公平学习健康系统的肺癌风险预测机器学习模型的开发:回顾性研究

Development of Lung Cancer Risk Prediction Machine Learning Models for Equitable Learning Health System: Retrospective Study.

作者信息

Chen Anjun, Wu Erman, Huang Ran, Shen Bairong, Han Ruobing, Wen Jian, Zhang Zhiyong, Li Qinghua

机构信息

School of Public Health, Guilin Medical University, Guilin, China.

West China Hospital, Chengdu, China.

出版信息

JMIR AI. 2024 Sep 11;3:e56590. doi: 10.2196/56590.

Abstract

BACKGROUND

A significant proportion of young at-risk patients and nonsmokers are excluded by the current guidelines for lung cancer (LC) screening, resulting in low-screening adoption. The vision of the US National Academy of Medicine to transform health systems into learning health systems (LHS) holds promise for bringing necessary structural changes to health care, thereby addressing the exclusivity and adoption issues of LC screening.

OBJECTIVE

This study aims to realize the LHS vision by designing an equitable, machine learning (ML)-enabled LHS unit for LC screening. It focuses on developing an inclusive and practical LC risk prediction model, suitable for initializing the ML-enabled LHS (ML-LHS) unit. This model aims to empower primary physicians in a clinical research network, linking central hospitals and rural clinics, to routinely deliver risk-based screening for enhancing LC early detection in broader populations.

METHODS

We created a standardized data set of health factors from 1397 patients with LC and 1448 control patients, all aged 30 years and older, including both smokers and nonsmokers, from a hospital's electronic medical record system. Initially, a data-centric ML approach was used to create inclusive ML models for risk prediction from all available health factors. Subsequently, a quantitative distribution of LC health factors was used in feature engineering to refine the models into a more practical model with fewer variables.

RESULTS

The initial inclusive 250-variable XGBoost model for LC risk prediction achieved performance metrics of 0.86 recall, 0.90 precision, and 0.89 accuracy. Post feature refinement, a practical 29-variable XGBoost model was developed, displaying performance metrics of 0.80 recall, 0.82 precision, and 0.82 accuracy. This model met the criteria for initializing the ML-LHS unit for risk-based, inclusive LC screening within clinical research networks.

CONCLUSIONS

This study designed an innovative ML-LHS unit for a clinical research network, aiming to sustainably provide inclusive LC screening to all at-risk populations. It developed an inclusive and practical XGBoost model from hospital electronic medical record data, capable of initializing such an ML-LHS unit for community and rural clinics. The anticipated deployment of this ML-LHS unit is expected to significantly improve LC-screening rates and early detection among broader populations, including those typically overlooked by existing screening guidelines.

摘要

背景

当前肺癌(LC)筛查指南将很大一部分高危年轻患者和非吸烟者排除在外,导致筛查的接受度较低。美国国家医学院将卫生系统转变为学习型卫生系统(LHS)的愿景有望给医疗保健带来必要的结构性变革,从而解决LC筛查的排他性和接受度问题。

目的

本研究旨在通过设计一个公平的、启用机器学习(ML)的用于LC筛查的LHS单元来实现LHS愿景。它专注于开发一个包容性强且实用的LC风险预测模型,适用于初始化启用ML的LHS(ML-LHS)单元。该模型旨在使临床研究网络(连接中心医院和农村诊所)中的初级医生能够常规地进行基于风险的筛查,以在更广泛人群中加强LC的早期检测。

方法

我们从一家医院的电子病历系统中创建了一个标准化数据集,包含1397例LC患者和1448例对照患者的健康因素,所有患者年龄均在30岁及以上,包括吸烟者和非吸烟者。最初,采用以数据为中心的ML方法,根据所有可用的健康因素创建用于风险预测的包容性ML模型。随后,在特征工程中使用LC健康因素的定量分布将模型优化为一个变量更少、更实用的模型。

结果

用于LC风险预测的初始包容性250变量XGBoost模型的性能指标为召回率0.86、精确率0.90和准确率0.89。经过特征优化后,开发了一个实用的29变量XGBoost模型,其性能指标为召回率0.80、精确率0.82和准确率0.82。该模型符合在临床研究网络中初始化用于基于风险的包容性LC筛查的ML-LHS单元的标准。

结论

本研究为临床研究网络设计了一个创新的ML-LHS单元,旨在可持续地为所有高危人群提供包容性LC筛查。它从医院电子病历数据中开发了一个包容性强且实用的XGBoost模型,能够为社区和农村诊所初始化这样一个ML-LHS单元。预计该ML-LHS单元的部署将显著提高更广泛人群(包括那些通常被现有筛查指南忽视的人群)的LC筛查率和早期检测率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9042/11425024/90025288dbd2/ai_v3i1e56590_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验