变压器模型和大语言模型是电子健康记录研究中高效的特征提取器。

Transformers and large language models are efficient feature extractors for electronic health record studies.

作者信息

Yuan Kevin, Yoon Chang Ho, Gu Qingze, Munby Henry, Walker A Sarah, Zhu Tingting, Eyre David W

机构信息

Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK.

Nuffield Department of Medicine, University of Oxford, Oxford, UK.

出版信息

Commun Med (Lond). 2025 Mar 21;5(1):83. doi: 10.1038/s43856-025-00790-1.

DOI:10.1038/s43856-025-00790-1

PMID:40119150

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11928488/

Abstract

BACKGROUND

Free-text data is abundant in electronic health records, but challenges in accurate and scalable information extraction mean less specific clinical codes are often used instead.

METHODS

We evaluated the efficacy of feature extraction using modern natural language processing methods (NLP) and large language models (LLMs) on 938,150 hospital antibiotic prescriptions from Oxfordshire, UK. Specifically, we investigated inferring the type(s) of infection from a free-text "indication" field, where clinicians state the reason for prescribing antibiotics. Clinical researchers labelled a subset of the 4000 most frequent unique indications (representing 692,310 prescriptions) into 11 categories describing the infection source or clinical syndrome. Various models were then trained to determine the binary presence/absence of these infection types and also any uncertainty expressed by clinicians.

RESULTS

We show on separate internal (n = 2000 prescriptions) and external test datasets (n = 2000 prescriptions), a fine-tuned domain-specific Bio+Clinical BERT model performs best across the 11 categories (average F1 score 0.97 and 0.98 respectively) and outperforms traditional regular expression (F1 = 0.71 and 0.74) and n-grams/XGBoost (F1 = 0.86 and 0.84) models. A zero-shot OpenAI GPT4 model matches the performance of traditional NLP models without the need for labelled training data (F1 = 0.71 and 0.86) and a fine-tuned GPT3.5 model achieves similar performance to the fine-tuned BERT-based model (F1 = 0.95 and 0.97). Infection sources obtained from free-text indications reveal specific infection sources 31% more often than ICD-10 codes.

CONCLUSIONS

Modern transformer-based models have the potential to be used widely throughout medicine to extract information from structured free-text records, to facilitate better research and patient care.

摘要

背景

电子健康记录中存在大量自由文本数据，但准确且可扩展的信息提取面临挑战，这意味着往往会使用不太具体的临床编码来替代。

方法

我们使用现代自然语言处理方法（NLP）和大语言模型（LLM）对来自英国牛津郡的938,150份医院抗生素处方进行了特征提取效果评估。具体而言，我们研究了从自由文本“适应症”字段中推断感染类型，临床医生会在该字段说明开具抗生素的原因。临床研究人员将4000个最常见的独特适应症（代表692,310份处方）的一个子集标记为11个描述感染源或临床综合征的类别。然后训练各种模型来确定这些感染类型的二元存在/不存在情况以及临床医生表达的任何不确定性。

结果

我们在单独的内部（n = 2000份处方）和外部测试数据集（n = 2000份处方）上表明，经过微调的特定领域生物+临床BERT模型在11个类别中表现最佳（平均F1分数分别为0.97和0.98），并且优于传统正则表达式（F1 = 0.71和0.74）和n-gram/XGBoost（F1 = 0.86和0.84）模型。零样本的OpenAI GPT4模型无需标记训练数据就能达到传统NLP模型的性能（F1 = 0.71和0.86），经过微调的GPT3.5模型实现了与经过微调的基于BERT的模型相似的性能（F1 = 0.95和0.97）。从自由文本适应症中获得的感染源比ICD - 10编码更频繁地揭示特定感染源31%。