Suppr超能文献

用于电子健康记录中准确疾病检测的大语言模型:以晶体性关节病为例。

Large language models for accurate disease detection in electronic health records: the examples of crystal arthropathies.

作者信息

Bürgisser Nils, Chalot Etienne, Mehouachi Samia, Buclin Clement P, Lauper Kim, Courvoisier Delphine S, Mongin Denis

机构信息

Division of Rheumatology, Geneva University Hospitals, Geneva, Switzerland

Division of Internal Medicine, Geneva University Hospitals, Geneva, Switzerland.

出版信息

RMD Open. 2024 Dec 20;10(4):e005003. doi: 10.1136/rmdopen-2024-005003.

Abstract

OBJECTIVES

We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.

METHODS

The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.

RESULTS

The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.

CONCLUSION

LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.

摘要

目的

我们提出并测试了一个框架,该框架使用最新的大语言模型(LLM),即Meta的Llama - 3 - 8B,来检测法语电子健康记录(EHR)文档中的疾病诊断。具体而言,它专注于检测痛风(法语为“goutte”),这是一个常见的法语术语,除了表示疾病外还有多种含义。该研究将基于大语言模型的框架的性能与传统自然语言处理技术进行了比较,并测试了其对所使用参数的依赖性。

方法

该框架是使用一个训练集和测试集开发的,该训练集和测试集包含从瑞士日内瓦一家三级大学医院随机选取的EHR文档中评估“痛风”的700个段落。所有段落均由两名医疗保健专业人员进行人工审核,并分类为疾病(真正的痛风)和非疾病(金标准)。使用少样本和思维链提示测试了大语言模型的准确性,并与基于正则表达式(regex)的方法进行了比较,重点关注模型参数和提示结构的影响。该框架在评估“焦磷酸钙沉积病(CPPD)”的600个段落上进一步得到了验证。

结果

基于大语言模型的算法优于正则表达式方法,痛风检测的阳性预测值为92.7%(88.7% - 95.4%),阴性预测值为96.6%(94.6% - 97.8%),准确率为95.4%(93.6% - 96.7%)。在CPPD的验证集中,准确率为94.1%(90.2% - 97.6%)。大语言模型框架在广泛的参数值范围内表现良好。

结论

大语言模型能够准确地从电子健康记录中检测疾病诊断,即使是在非英语语言中。它们可以促进以任何语言创建大型疾病登记册,改善疾病护理评估以及临床试验的患者招募。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2305/11664341/3d8fac8a6832/rmdopen-10-4-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验