Suppr超能文献

开发一种自然语言处理 (NLP) 模型,以自动从电子健康记录中提取临床数据:来自意大利综合卒中中心的结果。

Development of a Natural Language Processing (NLP) model to automatically extract clinical data from electronic health records: results from an Italian comprehensive stroke center.

机构信息

Department of Computing Sciences, Bocconi University, Milano, Italy; Artificial Intelligence Center, Humanitas Clinical and Research Center - IRCCS, Via A. Manzoni 56, Rozzano 20089, Milan, Italy.

Department of Biomedical Sciences, Humanitas University, via Rita Levi Montalcini 4, 20072 Pieve Emanuele, Milan, Italy.

出版信息

Int J Med Inform. 2024 Dec;192:105626. doi: 10.1016/j.ijmedinf.2024.105626. Epub 2024 Sep 19.

Abstract

INTRODUCTION

Data collection often relies on time-consuming manual inputs, with a vast amount of information embedded in unstructured texts such as patients' medical records and clinical notes. Our study aims to develop a pipeline that combines active learning (AL) and NLP techniques to enhance data extraction in an acute ischemic stroke cohort.

MATERIALS AND METHODS

Consecutive acute ischemic stroke patients who received reperfusion therapies at IRCCS Humanitas Research Hospital were included. The Italian NLP Bidirectional Encoder Representations from Transformers (BERT) model was trained with AL to automatically extract clinical variables from electronic health text. Simulated active learning performances were evaluated on a set of labels representing patients' comorbidities, comparing Bayesian Uncertainty Sampling by Disagreement (BALD) and random text selection. Prognostic models predicting patients' functional outcomes using Gradient Boosting were trained on manually labelled and semi-automatically extracted data and their performance was compared.

RESULTS

The active learning process initially showed null performance until around 20% of texts were labelled, possibly due to root layers freezing in the BERT model, yet overall, active learning improves model learning efficiency across most comorbidities. Prognostic modelling showed no significant difference in performance between models trained on manually labelled versus semi-automatically extracted data, indicating effective prediction capabilities in both settings.

CONCLUSIONS

We developed an efficient language model to automate the extraction of clinical data from Italian unstructured health texts in a cohort of ischemic stroke patients. In a preliminary analysis, we demonstrated its potential applicability for enhancing prediction model accuracy.

摘要

简介

数据收集通常依赖于耗时的手动输入,大量信息嵌入在非结构化文本中,如患者的病历和临床记录。我们的研究旨在开发一个结合主动学习(AL)和 NLP 技术的管道,以增强急性缺血性卒中队列中的数据提取。

材料和方法

连续纳入在 IRCCS Humanitas 研究医院接受再灌注治疗的急性缺血性卒中患者。意大利 NLP 双向编码器表示从变压器(BERT)模型使用 AL 进行训练,以自动从电子健康文本中提取临床变量。在一组代表患者合并症的标签上评估了模拟主动学习性能,比较了贝叶斯不确定性抽样不一致(BALD)和随机文本选择。使用梯度提升在手动标记和半自动提取数据上训练预测患者功能结果的预后模型,并比较其性能。

结果

主动学习过程最初表现为零性能,直到大约 20%的文本被标记,这可能是由于 BERT 模型的根层冻结,但总体而言,主动学习提高了模型在大多数合并症中的学习效率。预后建模表明,在手动标记与半自动提取数据上训练的模型之间,性能没有显著差异,表明在两种设置下都具有有效的预测能力。

结论

我们开发了一种有效的语言模型,可从意大利语非结构化健康文本中自动提取缺血性卒中患者队列的临床数据。在初步分析中,我们证明了其提高预测模型准确性的潜在适用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验