Rafique Daniel, Liu Xuan, Gong Bo, Belsito Laura, McCradden Melissa D, Mazwi Mjaye L, Lee Wayne, Ohanlon Graham, Tsang Kyle, Shroff Manohar, Ertl-Wagner Birgit, Khalvati Farzad
Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada.
The Hospital for Sick Children, Research Institute, Toronto, ON, Canada.
Front Artif Intell. 2025 Sep 3;8:1652397. doi: 10.3389/frai.2025.1652397. eCollection 2025.
Patients missing their appointments (no-shows) are a persistent issue that results in idle resources while delaying critical patient prognosis. Likewise, long waiting times increase frustration for patients, leaving a negative impression on the appointment. In this paper, we explore 3 modalities of diagnostic and interventional radiology appointments for pediatric patients at the Hospital for Sick Children (SickKids), Toronto, ON, Canada. Our goal was to survey machine learning methods that best predict the risk of patient no-shows and long wait-times exceeding 1 hour for scheduling teams to propose targeted downstream accommodations.
We experimented with 6 predictive model types separately trained on both tasks which included extreme gradient boosting (XGBoost), Random Forest (RF), Support Vector Machine, Logistic Regression, Artificial Neural Network, and a pre-trained large language model (LLM). Utilizing 20 features containing a mixture of patient demographics and appointment related data, we experimented with different data balancing methods including instance hardness threshold (IHT) and class weighting to reduce bias in prediction. We then conducted a comparative study of the improvements made by utilizing continuous contextual data in our LLM which boasted a 51% improvement in F1 score for the wait-time model.
Our XGBoost model had the best combination of AUC and F1 scores (0.96 and 0.62, respectively) for predicting no-show while RF had the best AUC and F1 scores (0.83 and 0.61, respectively) for wait-time prediction. The LLMs also performed well for 90% probability thresholds (high risk patients) while being robustly calibrated on unseen test data.
Our results surveyed multiple algorithms and data balancing methods to propose the greatest performing models on our tasks, implemented a unique methodology to use LLMs on heterogeneous data within this domain, and demonstrated the greater importance of contextual appointment data over patient demographic features for a more equitable prediction algorithm. Going forward, the predictive output (calibrated probabilities of events) can be used as stochastic input for risk-based optimized scheduling to provide accommodation for patients less likely to receive quality access to healthcare.
患者爽约是一个长期存在的问题,会导致资源闲置,同时延误患者的关键预后。同样,长时间等待会增加患者的沮丧感,给预约留下负面印象。在本文中,我们探讨了加拿大多伦多病童医院(SickKids)为儿科患者安排诊断性和介入性放射学检查的三种模式。我们的目标是研究机器学习方法,以最佳预测患者爽约风险以及等待时间超过1小时的情况,以便调度团队提出有针对性的下游调整措施。
我们分别在这两项任务上对6种预测模型类型进行了实验,包括极端梯度提升(XGBoost)、随机森林(RF)、支持向量机、逻辑回归、人工神经网络和一个预训练的大语言模型(LLM)。利用包含患者人口统计学和预约相关数据的20个特征,我们试验了不同的数据平衡方法,包括实例硬度阈值(IHT)和类别加权,以减少预测偏差。然后,我们对在LLM中利用连续上下文数据所带来的改进进行了比较研究,该模型在等待时间模型的F1分数上提高了51%。
我们的XGBoost模型在预测爽约方面具有最佳的AUC和F1分数组合(分别为0.96和0.62),而RF在等待时间预测方面具有最佳的AUC和F1分数(分别为0.83和0.61)。大语言模型在90%概率阈值(高风险患者)下也表现良好,同时在未见测试数据上具有良好的校准效果。
我们的结果研究了多种算法和数据平衡方法,以提出在我们的任务上表现最佳的模型,实施了一种独特的方法在该领域的异构数据上使用大语言模型,并证明了上下文预约数据比患者人口统计学特征对于更公平的预测算法更为重要。展望未来,预测输出(事件的校准概率)可作为基于风险的优化调度的随机输入,为不太可能获得优质医疗服务的患者提供调整措施。