Suppr超能文献

利用非结构化数据预测住院时间。

The prediction of hospital length of stay using unstructured data.

机构信息

Pôle Territorial Santé Publique et Performance, Centre Hospitalier de Troyes, 101 Avenue Anatole France CS 10718, 10003, Troyes Cedex, France.

Research and Consulting, CODOC SAS, 75008, Paris, France.

出版信息

BMC Med Inform Decis Mak. 2021 Dec 18;21(1):351. doi: 10.1186/s12911-021-01722-4.

Abstract

OBJECTIVE

This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis.

METHODS

This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data.

RESULTS

The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%).

CONCLUSIONS

LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS.

摘要

目的

本研究旨在评估基于机器学习的住院时间(LOS)预测的性能改进,当考虑到文本中记录的临床症状并与仅考虑年龄、性别和主要 ICD 诊断等结构化信息的传统方法进行比较时。

方法

本研究为观察性回顾性队列研究,分析了 2019 年 1 月 1 日至 9 月 24 日期间入院的患者住院情况。每位患者均通过急诊部(ED)入院,并在后续服务中住院超过两天。使用两个随机森林模型预测 LOS。第一个模型包括从电子健康记录(EHR)中提取的非结构化文本。使用基于 UMLS 术语的词嵌入算法,限制为以患者为中心的肯定语句,评估 EHR 数据。第二个模型主要基于诊断编码的国际疾病分类第 10 版(ICD-10)和分诊代码(CCMU/GEMSA 分类)的结构化数据。两个模型共有的变量为:年龄、性别、邮政编码、ED 中的 LOS、最近就诊标志、ED 后分配给患者的病房和短期 ED 活动。模型在 80%的数据上进行训练,并在剩余的 20%测试数据上评估性能。

结果

使用非结构化数据的模型准确率为 75.0%,而包含结构化数据的模型准确率为 74.1%。两种模型在 86.6%的情况下产生了相似的预测。在仅限于重症监护患者的二次分析中,两种模型的准确性也相似(76.3%与 75.0%)。

结论

使用非结构化数据进行 LOS 预测的准确率与使用结构化数据相似,可以考虑用于准确建模 LOS。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/534e/8684269/bb5c738120c2/12911_2021_1722_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验