• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开发一种自然语言处理 (NLP) 模型,以自动从电子健康记录中提取临床数据:来自意大利综合卒中中心的结果。

Development of a Natural Language Processing (NLP) model to automatically extract clinical data from electronic health records: results from an Italian comprehensive stroke center.

机构信息

Department of Computing Sciences, Bocconi University, Milano, Italy; Artificial Intelligence Center, Humanitas Clinical and Research Center - IRCCS, Via A. Manzoni 56, Rozzano 20089, Milan, Italy.

Department of Biomedical Sciences, Humanitas University, via Rita Levi Montalcini 4, 20072 Pieve Emanuele, Milan, Italy.

出版信息

Int J Med Inform. 2024 Dec;192:105626. doi: 10.1016/j.ijmedinf.2024.105626. Epub 2024 Sep 19.

DOI:10.1016/j.ijmedinf.2024.105626
PMID:39321491
Abstract

INTRODUCTION

Data collection often relies on time-consuming manual inputs, with a vast amount of information embedded in unstructured texts such as patients' medical records and clinical notes. Our study aims to develop a pipeline that combines active learning (AL) and NLP techniques to enhance data extraction in an acute ischemic stroke cohort.

MATERIALS AND METHODS

Consecutive acute ischemic stroke patients who received reperfusion therapies at IRCCS Humanitas Research Hospital were included. The Italian NLP Bidirectional Encoder Representations from Transformers (BERT) model was trained with AL to automatically extract clinical variables from electronic health text. Simulated active learning performances were evaluated on a set of labels representing patients' comorbidities, comparing Bayesian Uncertainty Sampling by Disagreement (BALD) and random text selection. Prognostic models predicting patients' functional outcomes using Gradient Boosting were trained on manually labelled and semi-automatically extracted data and their performance was compared.

RESULTS

The active learning process initially showed null performance until around 20% of texts were labelled, possibly due to root layers freezing in the BERT model, yet overall, active learning improves model learning efficiency across most comorbidities. Prognostic modelling showed no significant difference in performance between models trained on manually labelled versus semi-automatically extracted data, indicating effective prediction capabilities in both settings.

CONCLUSIONS

We developed an efficient language model to automate the extraction of clinical data from Italian unstructured health texts in a cohort of ischemic stroke patients. In a preliminary analysis, we demonstrated its potential applicability for enhancing prediction model accuracy.

摘要

简介

数据收集通常依赖于耗时的手动输入,大量信息嵌入在非结构化文本中,如患者的病历和临床记录。我们的研究旨在开发一个结合主动学习(AL)和 NLP 技术的管道,以增强急性缺血性卒中队列中的数据提取。

材料和方法

连续纳入在 IRCCS Humanitas 研究医院接受再灌注治疗的急性缺血性卒中患者。意大利 NLP 双向编码器表示从变压器(BERT)模型使用 AL 进行训练,以自动从电子健康文本中提取临床变量。在一组代表患者合并症的标签上评估了模拟主动学习性能,比较了贝叶斯不确定性抽样不一致(BALD)和随机文本选择。使用梯度提升在手动标记和半自动提取数据上训练预测患者功能结果的预后模型,并比较其性能。

结果

主动学习过程最初表现为零性能,直到大约 20%的文本被标记,这可能是由于 BERT 模型的根层冻结,但总体而言,主动学习提高了模型在大多数合并症中的学习效率。预后建模表明,在手动标记与半自动提取数据上训练的模型之间,性能没有显著差异,表明在两种设置下都具有有效的预测能力。

结论

我们开发了一种有效的语言模型,可从意大利语非结构化健康文本中自动提取缺血性卒中患者队列的临床数据。在初步分析中,我们证明了其提高预测模型准确性的潜在适用性。

相似文献

1
Development of a Natural Language Processing (NLP) model to automatically extract clinical data from electronic health records: results from an Italian comprehensive stroke center.开发一种自然语言处理 (NLP) 模型,以自动从电子健康记录中提取临床数据:来自意大利综合卒中中心的结果。
Int J Med Inform. 2024 Dec;192:105626. doi: 10.1016/j.ijmedinf.2024.105626. Epub 2024 Sep 19.
2
Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation.基于转换器模型的双向编码器表示的多方面自然语言处理任务评估在双语(韩语和英语)临床笔记中的应用:算法开发和验证。
JMIR Med Inform. 2024 Oct 30;12:e52897. doi: 10.2196/52897.
3
Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline.利用低资源自然语言处理管道从非结构化临床笔记中构建大规模注册中心。
Artif Intell Med. 2024 May;151:102847. doi: 10.1016/j.artmed.2024.102847. Epub 2024 Mar 22.
4
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
5
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
6
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。
BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.
7
Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records.开发人工智能模型,从日本电子健康记录中提取肿瘤学结局。
Adv Ther. 2023 Mar;40(3):934-950. doi: 10.1007/s12325-022-02397-7. Epub 2022 Dec 22.
8
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
9
Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.评估医疗保健中的实体识别:实体模型定量研究。
JMIR Med Inform. 2024 Oct 17;12:e59782. doi: 10.2196/59782.
10
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

引用本文的文献

1
The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.机器学习在疾病预测与管理中分析真实世界数据的应用:系统评价
JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.