Fan Xuehui, Ye Ruixue, Gao Yan, Xue Kaiwen, Zhang Zeyu, Xu Jing, Zhao Jingpu, Feng Jun, Wang Yulong
Department of Rehabilitation Medicine, The First Affiliated Hospital of Shenzhen University, The Second People's Hospital of Shenzhen, Shenzhen, Guangdong, China.
Linping Hospital of Integrated Traditional Chinese and Western, Medicine, Hangzhou, Zhejiang, China.
Front Artif Intell. 2025 Jan 15;7:1473837. doi: 10.3389/frai.2024.1473837. eCollection 2024.
The Department of Rehabilitation Medicine is key to improving patients' quality of life. Driven by chronic diseases and an aging population, there is a need to enhance the efficiency and resource allocation of outpatient facilities. This study aims to analyze the treatment preferences of outpatient rehabilitation patients by using data and a grading tool to establish predictive models. The goal is to improve patient visit efficiency and optimize resource allocation through these predictive models.
Data were collected from 38 Chinese institutions, including 4,244 patients visiting outpatient rehabilitation clinics. Data processing was conducted using Python software. The pandas library was used for data cleaning and preprocessing, involving 68 categorical and 12 continuous variables. The steps included handling missing values, data normalization, and encoding conversion. The data were divided into 80% training and 20% test sets using the Scikit-learn library to ensure model independence and prevent overfitting. Performance comparisons among XGBoost, random forest, and logistic regression were conducted using metrics, including accuracy and receiver operating characteristic (ROC) curves. The imbalanced learning library's SMOTE technique was used to address the sample imbalance during model training. The model was optimized using a confusion matrix and feature importance analysis, and partial dependence plots (PDP) were used to analyze the key influencing factors.
XGBoost achieved the highest overall accuracy of 80.21% with high precision and recall in Category 1. random forest showed a similar overall accuracy. Logistic Regression had a significantly lower accuracy, indicating difficulties with nonlinear data. The key influencing factors identified include distance to medical institutions, arrival time, length of hospital stay, and specific diseases, such as cardiovascular, pulmonary, oncological, and orthopedic conditions. The tiered diagnosis and treatment tool effectively helped doctors assess patients' conditions and recommend suitable medical institutions based on rehabilitation grading.
This study confirmed that ensemble learning methods, particularly XGBoost, outperform single models in classification tasks involving complex datasets. Addressing class imbalance and enhancing feature engineering can further improve model performance. Understanding patient preferences and the factors influencing medical institution selection can guide healthcare policies to optimize resource allocation, improve service quality, and enhance patient satisfaction. Tiered diagnosis and treatment tools play a crucial role in helping doctors evaluate patient conditions and make informed recommendations for appropriate medical care.
康复医学科对于提高患者生活质量至关重要。受慢性病和人口老龄化的推动,需要提高门诊设施的效率和资源分配。本研究旨在通过使用数据和分级工具来建立预测模型,分析门诊康复患者的治疗偏好。目标是通过这些预测模型提高患者就诊效率并优化资源分配。
从38家中国机构收集数据,包括4244名门诊康复诊所就诊患者。使用Python软件进行数据处理。使用pandas库进行数据清理和预处理,涉及68个分类变量和12个连续变量。步骤包括处理缺失值、数据归一化和编码转换。使用Scikit-learn库将数据分为80%训练集和20%测试集,以确保模型独立性并防止过拟合。使用包括准确率和受试者工作特征(ROC)曲线在内的指标对XGBoost、随机森林和逻辑回归进行性能比较。使用不平衡学习库的SMOTE技术解决模型训练期间的样本不平衡问题。使用混淆矩阵和特征重要性分析对模型进行优化,并使用部分依赖图(PDP)分析关键影响因素。
XGBoost在第1类中实现了最高的总体准确率80.21%,具有高精度和召回率。随机森林显示出相似的总体准确率。逻辑回归的准确率明显较低,表明处理非线性数据存在困难。确定的关键影响因素包括到医疗机构的距离、到达时间、住院时间以及特定疾病,如心血管、肺部、肿瘤和骨科疾病。分级诊疗工具有效地帮助医生评估患者病情,并根据康复分级推荐合适的医疗机构。
本研究证实,在涉及复杂数据集的分类任务中,集成学习方法,尤其是XGBoost,优于单一模型。解决类别不平衡问题并加强特征工程可以进一步提高模型性能。了解患者偏好和影响医疗机构选择的因素可以指导医疗政策优化资源分配、提高服务质量并增强患者满意度。分级诊疗工具在帮助医生评估患者病情并为适当的医疗护理提供明智建议方面发挥着关键作用。