Suppr超能文献

基于数据驱动的肺癌电视辅助胸腔镜手术后持续漏气预测:通过ePath系统利用真实世界数据开发和验证基于机器学习的模型

Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system.

作者信息

Tou Saori, Matsumoto Koutarou, Hashinokuchi Asato, Kinoshita Fumihiko, Nakaguma Hideki, Kozuma Yukio, Sugeta Rui, Nohara Yasunobu, Yamashita Takanori, Wakata Yoshifumi, Takenaka Tomoyoshi, Iwatani Kazunori, Soejima Hidehisa, Yoshizumi Tomoharu, Nakashima Naoki, Kamouchi Masahiro

机构信息

Department of Health Care Administration and Management, Graduate School of Medical Sciences Kyushu University Fukuoka Japan.

Department of Surgery and Science, Graduate School of Medical Sciences Kyushu University Fukuoka Japan.

出版信息

Learn Health Syst. 2024 Oct 11;9(2):e10469. doi: 10.1002/lrh2.10469. eCollection 2025 Apr.

Abstract

INTRODUCTION

The reliability of data-driven predictions in real-world scenarios remains uncertain. This study aimed to develop and validate a machine-learning-based model for predicting clinical outcomes using real-world data from an electronic clinical pathway (ePath) system.

METHODS

All available data were collected from patients with lung cancer who underwent video-assisted thoracoscopic surgery at two independent hospitals utilizing the ePath system. The primary clinical outcome of interest was prolonged air leak (PAL), defined as drainage removal more than 2 days post-surgery. Data-driven prediction models were developed in a cohort of 314 patients from a university hospital applying sparse linear regression models (least absolute shrinkage and selection operator, ridge, and elastic net) and decision tree ensemble models (random forest and extreme gradient boosting). Model performance was then validated in a cohort of 154 patients from a tertiary hospital using the area under the receiver operating characteristic curve (AUROC) and calibration plots.

RESULTS

To mitigate bias, variables with missing data related to PAL or those with high rates of missing data were excluded from the dataset. Fivefold cross-validation indicated improved AUROCs when utilizing key variables, even post-imputation of missing data. Dichotomizing continuous variables enhanced performance, particularly when fewer variables were employed in the decision tree ensemble models. Consequently, regression models incorporating seven key variables in complete case analysis demonstrated superior discriminatory ability for both internal (AUROCs: 0.77-0.84) and external cohorts (AUROCs: 0.75-0.84). These models exhibited satisfactory calibration in both cohorts.

CONCLUSIONS

The data-driven prediction model implementing the ePath system exhibited adequate performance in predicting PAL post-video-assisted thoracoscopic surgery, optimizing variables and considering population characteristics in a real-world setting.

摘要

引言

在现实场景中,数据驱动预测的可靠性仍不确定。本研究旨在开发并验证一种基于机器学习的模型,该模型使用来自电子临床路径(ePath)系统的真实世界数据来预测临床结果。

方法

从两家独立医院接受电视辅助胸腔镜手术的肺癌患者中收集所有可用数据,这些医院使用了ePath系统。感兴趣的主要临床结果是持续性漏气(PAL),定义为术后2天以上拔除引流管。在一所大学医院的314名患者队列中,应用稀疏线性回归模型(最小绝对收缩和选择算子、岭回归和弹性网络)和决策树集成模型(随机森林和极端梯度提升)开发数据驱动的预测模型。然后,使用受试者操作特征曲线下面积(AUROC)和校准图,在一家三级医院的154名患者队列中验证模型性能。

结果

为减轻偏差,将与PAL相关的缺失数据变量或缺失数据率高的变量从数据集中排除。五重交叉验证表明,即使在对缺失数据进行插补后,使用关键变量时AUROC也有所提高。将连续变量二分法可提高性能,尤其是在决策树集成模型中使用较少变量时。因此,在完整病例分析中纳入七个关键变量的回归模型对内部队列(AUROC:0.77 - 0.84)和外部队列(AUROC:0.75 - 0.84)均表现出卓越的区分能力。这些模型在两个队列中均表现出令人满意的校准。

结论

实施ePath系统的数据驱动预测模型在预测电视辅助胸腔镜手术后的PAL方面表现出足够的性能,在现实环境中优化了变量并考虑了人群特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6b9/12000770/1c7d3c794466/LRH2-9-e10469-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验