基于XGBoost机器学习算法的门诊康复患者偏好预测及分级诊疗优化

Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm.

作者信息

Fan Xuehui, Ye Ruixue, Gao Yan, Xue Kaiwen, Zhang Zeyu, Xu Jing, Zhao Jingpu, Feng Jun, Wang Yulong

机构信息

Department of Rehabilitation Medicine, The First Affiliated Hospital of Shenzhen University, The Second People's Hospital of Shenzhen, Shenzhen, Guangdong, China.

Linping Hospital of Integrated Traditional Chinese and Western, Medicine, Hangzhou, Zhejiang, China.

出版信息

Front Artif Intell. 2025 Jan 15;7:1473837. doi: 10.3389/frai.2024.1473837. eCollection 2024.

DOI:10.3389/frai.2024.1473837

PMID:39881882

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11776094/

Abstract

BACKGROUND

The Department of Rehabilitation Medicine is key to improving patients' quality of life. Driven by chronic diseases and an aging population, there is a need to enhance the efficiency and resource allocation of outpatient facilities. This study aims to analyze the treatment preferences of outpatient rehabilitation patients by using data and a grading tool to establish predictive models. The goal is to improve patient visit efficiency and optimize resource allocation through these predictive models.

METHODS

Data were collected from 38 Chinese institutions, including 4,244 patients visiting outpatient rehabilitation clinics. Data processing was conducted using Python software. The pandas library was used for data cleaning and preprocessing, involving 68 categorical and 12 continuous variables. The steps included handling missing values, data normalization, and encoding conversion. The data were divided into 80% training and 20% test sets using the Scikit-learn library to ensure model independence and prevent overfitting. Performance comparisons among XGBoost, random forest, and logistic regression were conducted using metrics, including accuracy and receiver operating characteristic (ROC) curves. The imbalanced learning library's SMOTE technique was used to address the sample imbalance during model training. The model was optimized using a confusion matrix and feature importance analysis, and partial dependence plots (PDP) were used to analyze the key influencing factors.

RESULTS

XGBoost achieved the highest overall accuracy of 80.21% with high precision and recall in Category 1. random forest showed a similar overall accuracy. Logistic Regression had a significantly lower accuracy, indicating difficulties with nonlinear data. The key influencing factors identified include distance to medical institutions, arrival time, length of hospital stay, and specific diseases, such as cardiovascular, pulmonary, oncological, and orthopedic conditions. The tiered diagnosis and treatment tool effectively helped doctors assess patients' conditions and recommend suitable medical institutions based on rehabilitation grading.

CONCLUSION

This study confirmed that ensemble learning methods, particularly XGBoost, outperform single models in classification tasks involving complex datasets. Addressing class imbalance and enhancing feature engineering can further improve model performance. Understanding patient preferences and the factors influencing medical institution selection can guide healthcare policies to optimize resource allocation, improve service quality, and enhance patient satisfaction. Tiered diagnosis and treatment tools play a crucial role in helping doctors evaluate patient conditions and make informed recommendations for appropriate medical care.

摘要

背景

康复医学科对于提高患者生活质量至关重要。受慢性病和人口老龄化的推动，需要提高门诊设施的效率和资源分配。本研究旨在通过使用数据和分级工具来建立预测模型，分析门诊康复患者的治疗偏好。目标是通过这些预测模型提高患者就诊效率并优化资源分配。

方法

从38家中国机构收集数据，包括4244名门诊康复诊所就诊患者。使用Python软件进行数据处理。使用pandas库进行数据清理和预处理，涉及68个分类变量和12个连续变量。步骤包括处理缺失值、数据归一化和编码转换。使用Scikit-learn库将数据分为80%训练集和20%测试集，以确保模型独立性并防止过拟合。使用包括准确率和受试者工作特征（ROC）曲线在内的指标对XGBoost、随机森林和逻辑回归进行性能比较。使用不平衡学习库的SMOTE技术解决模型训练期间的样本不平衡问题。使用混淆矩阵和特征重要性分析对模型进行优化，并使用部分依赖图（PDP）分析关键影响因素。

结果

XGBoost在第1类中实现了最高的总体准确率80.21%，具有高精度和召回率。随机森林显示出相似的总体准确率。逻辑回归的准确率明显较低，表明处理非线性数据存在困难。确定的关键影响因素包括到医疗机构的距离、到达时间、住院时间以及特定疾病，如心血管、肺部、肿瘤和骨科疾病。分级诊疗工具有效地帮助医生评估患者病情，并根据康复分级推荐合适的医疗机构。

结论

本研究证实，在涉及复杂数据集的分类任务中，集成学习方法，尤其是XGBoost，优于单一模型。解决类别不平衡问题并加强特征工程可以进一步提高模型性能。了解患者偏好和影响医疗机构选择的因素可以指导医疗政策优化资源分配、提高服务质量并增强患者满意度。分级诊疗工具在帮助医生评估患者病情并为适当的医疗护理提供明智建议方面发挥着关键作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/435a/11776094/d3d5d9ea4232/frai-07-1473837-g001.jpg

相似文献

Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm.基于XGBoost机器学习算法的门诊康复患者偏好预测及分级诊疗优化

Front Artif Intell. 2025 Jan 15;7:1473837. doi: 10.3389/frai.2024.1473837. eCollection 2024.

Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.基于机器学习的算法用于预测胫骨平台骨折治疗后2年和5年全膝关节置换风险的研究进展

Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.

Machine learning-based predictive models for perioperative major adverse cardiovascular events in patients with stable coronary artery disease undergoing noncardiac surgery.基于机器学习的预测模型用于接受非心脏手术的稳定冠状动脉疾病患者围手术期主要不良心血管事件的预测

Comput Methods Programs Biomed. 2025 Mar;260:108561. doi: 10.1016/j.cmpb.2024.108561. Epub 2024 Dec 13.

Prediction of sepsis mortality in ICU patients using machine learning methods.使用机器学习方法预测 ICU 患者的败血症死亡率。

BMC Med Inform Decis Mak. 2024 Aug 16;24(1):228. doi: 10.1186/s12911-024-02630-z.

Enhanced prediction of ventilator-associated pneumonia in patients with traumatic brain injury using advanced machine learning techniques.使用先进机器学习技术增强对创伤性脑损伤患者呼吸机相关性肺炎的预测。

Sci Rep. 2025 Apr 2;15(1):11363. doi: 10.1038/s41598-025-95779-0.

Predicting total healthcare demand using machine learning: separate and combined analysis of predisposing, enabling, and need factors.使用机器学习预测医疗总需求：对诱发因素、促成因素和需求因素进行单独及综合分析。

BMC Health Serv Res. 2025 Mar 12;25(1):366. doi: 10.1186/s12913-025-12502-5.

Performance of machine learning models in predicting difficult laryngoscopy in the emergency department: a single-centre retrospective study comparing with conventional regression method.机器学习模型在急诊科预测困难喉镜检查中的性能：一项与传统回归方法比较的单中心回顾性研究

BMC Emerg Med. 2025 Feb 21;25(1):28. doi: 10.1186/s12873-025-01185-0.

Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。

BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.

Predicting conversion of ambulatory ACDF patients to inpatient: a machine learning approach.预测门诊颈椎前路椎间盘切除融合术患者转为住院患者：一种机器学习方法。

Spine J. 2024 Apr;24(4):563-571. doi: 10.1016/j.spinee.2023.11.010. Epub 2023 Nov 21.

Prediction of lumbar disc degeneration based on interpretable machine learning models: retrospective cohort study.基于可解释机器学习模型的腰椎间盘退变预测：回顾性队列研究

Spine J. 2025 Apr 9. doi: 10.1016/j.spinee.2025.04.004.

引用本文的文献

Enhancing patient rehabilitation outcomes: artificial intelligence-driven predictive modeling for home discharge in neurological and orthopedic conditions.提高患者康复效果：针对神经科和骨科疾病出院居家情况的人工智能驱动预测模型

J Neuroeng Rehabil. 2025 May 26;22(1):117. doi: 10.1186/s12984-025-01654-4.

本文引用的文献

Does supplemental private health insurance impact health care utilization and seeking behavior of residents covered by social health insurance? Evidence from China National Health Services Survey.补充私人健康保险是否会影响社会健康保险覆盖的居民的医疗保健利用和寻求行为？来自中国国家卫生服务调查的证据。

Int J Equity Health. 2024 May 31;23(1):113. doi: 10.1186/s12939-024-02158-8.

Factors associated with patients' healthcare-seeking behavior and related clinical outcomes under China's hierarchical healthcare delivery system.在中国分级医疗服务体系下，与患者就医行为及相关临床结局相关的因素。

Front Public Health. 2024 Apr 12;12:1326272. doi: 10.3389/fpubh.2024.1326272. eCollection 2024.

Use of artificial intelligence in critical care: opportunities and obstacles.人工智能在重症监护中的应用：机遇与挑战。

Crit Care. 2024 Apr 8;28(1):113. doi: 10.1186/s13054-024-04860-z.

Machine learning algorithms to uncover risk factors of breast cancer: insights from a large case-control study.用于揭示乳腺癌风险因素的机器学习算法：来自一项大型病例对照研究的见解

Front Oncol. 2024 Feb 15;13:1276232. doi: 10.3389/fonc.2023.1276232. eCollection 2023.

Cancer statistics, 2024.2024年癌症统计数据。

CA Cancer J Clin. 2024 Jan-Feb;74(1):12-49. doi: 10.3322/caac.21820. Epub 2024 Jan 17.

Lung resection after initial nonoperative treatment for non-small cell lung cancer.非小细胞肺癌初始非手术治疗后的肺切除术

J Thorac Cardiovasc Surg. 2024 Aug;168(2):364-373.e10. doi: 10.1016/j.jtcvs.2023.11.040. Epub 2023 Nov 30.

Decision trees and random forests.决策树与随机森林。

Am J Orthod Dentofacial Orthop. 2023 Dec;164(6):894-897. doi: 10.1016/j.ajodo.2023.09.011.

The experience of healthcare professionals implementing recovery-oriented practice in mental health inpatient units: A qualitative evidence synthesis.医疗保健专业人员在精神科住院病房实施以康复为导向的实践的经验：定性证据综合。

J Psychiatr Ment Health Nurs. 2024 Jun;31(3):287-302. doi: 10.1111/jpm.12985. Epub 2023 Oct 8.

Evaluation of the association between health insurance status and healthcare utilization and expenditures among adult cancer survivors in the United States.美国成年癌症幸存者的健康保险状况与医疗保健利用及支出之间关联的评估。

Res Social Adm Pharm. 2023 May;19(5):821-829. doi: 10.1016/j.sapharm.2023.02.005. Epub 2023 Feb 20.

Implementation of five machine learning methods to predict the 52-week blood glucose level in patients with type 2 diabetes.五种机器学习方法在预测 2 型糖尿病患者 52 周血糖水平中的应用。

Front Endocrinol (Lausanne). 2023 Jan 20;13:1061507. doi: 10.3389/fendo.2022.1061507. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于XGBoost机器学习算法的门诊康复患者偏好预测及分级诊疗优化

Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献