Pandey Sanjib Raj, Tile Joy Dooshima, Oghaz Mahdi Maktab Dar
The Royal Marsden NHS Foundation Trust, Digital Services, London, United Kingdom.
Faculty of Science and Engineering, Anglia Ruskin University, Cambridge, United Kingdom.
PLoS One. 2025 Sep 2;20(9):e0328848. doi: 10.1371/journal.pone.0328848. eCollection 2025.
Hospital readmission prediction is a crucial area of research due to its impact on healthcare expenditure, patient care quality, and policy formulation. Accurate prediction of patient readmissions within 30 days post-discharge remains a considerable challenging, given the complexity of healthcare data, which includes both structured (e.g., demographic, clinical) and unstructured (e.g., clinical notes, medical images) data. Consequently, there is an increasing need for hybrid approaches that effectively integrate these two data types to enhance all-cause readmission prediction performance. Despite notable advancements in machine learning, existing predictive models often struggle to achieve both high precision and balanced predictions, mainly due to the variability in patients' outcome and the complex factors influencing readmissions. This study seeks to address these challenges by developing a hybrid predictive model that combines structured data with unstructured text representations derived from ClinicalT5, a transformer-based large language model. The performance of these hybrid models is evaluated against text-only models, such as PubMedBERT, using multiple metrics including accuracy, precision, recall, and AUROC score. The results demonstrate that the hybrid models, which integrate both structured and unstructured data, outperform text-only models trained on the same dataset. Specifically, hybrid models achieve higher precision and balanced recall, reducing false positives and providing more reliable predictions. This research underscores the potential of hybrid data integration, using ClinicalT5, to improve hospital readmission prediction, thereby improving healthcare outcomes through more accurate predictions that can support better clinical decision making and reduce unnecessary readmissions.
由于医院再入院预测对医疗保健支出、患者护理质量和政策制定有影响,因此它是一个至关重要的研究领域。鉴于医疗保健数据的复杂性,包括结构化数据(如人口统计学、临床数据)和非结构化数据(如临床记录、医学图像),准确预测患者出院后30天内的再入院情况仍然是一项颇具挑战性的任务。因此,越来越需要采用混合方法,有效地整合这两种数据类型,以提高全因再入院预测性能。尽管机器学习取得了显著进展,但现有的预测模型往往难以实现高精度和平衡预测,这主要是由于患者预后的变异性以及影响再入院的复杂因素所致。本研究旨在通过开发一种混合预测模型来应对这些挑战,该模型将结构化数据与从基于Transformer的大型语言模型ClinicalT5派生的非结构化文本表示相结合。使用包括准确率、精确率、召回率和AUROC分数在内的多个指标,将这些混合模型的性能与仅基于文本的模型(如PubMedBERT)进行比较。结果表明,整合了结构化和非结构化数据的混合模型优于在同一数据集上训练的仅基于文本的模型。具体而言,混合模型实现了更高的精确率和平衡召回率,减少了误报,并提供了更可靠的预测。这项研究强调了使用ClinicalT5进行混合数据集成在改善医院再入院预测方面的潜力,从而通过更准确的预测改善医疗保健结果,这些预测可以支持更好的临床决策并减少不必要地再入院。