Bürgisser Nils, Chalot Etienne, Mehouachi Samia, Buclin Clement P, Lauper Kim, Courvoisier Delphine S, Mongin Denis
Division of Rheumatology, Geneva University Hospitals, Geneva, Switzerland
Division of Internal Medicine, Geneva University Hospitals, Geneva, Switzerland.
RMD Open. 2024 Dec 20;10(4):e005003. doi: 10.1136/rmdopen-2024-005003.
We propose and test a framework to detect disease diagnosis using a recent large language model (LLM), Meta's Llama-3-8B, on French-language electronic health record (EHR) documents. Specifically, it focuses on detecting gout ('goutte' in French), a ubiquitous French term that has multiple meanings beyond the disease. The study compares the performance of the LLM-based framework with traditional natural language processing techniques and tests its dependence on the parameter used.
The framework was developed using a training and testing set of 700 paragraphs assessing 'gout' from a random selection of EHR documents from a tertiary university hospital in Geneva, Switzerland. All paragraphs were manually reviewed and classified by two healthcare professionals into disease (true gout) and non-disease (gold standard). The LLM's accuracy was tested using few-shot and chain-of-thought prompting and compared with a regular expression (regex)-based method, focusing on the effects of model parameters and prompt structure. The framework was further validated on 600 paragraphs assessing 'Calcium Pyrophosphate Deposition Disease (CPPD)'.
The LLM-based algorithm outperformed the regex method, achieving a 92.7% (88.7%-95.4%) positive predictive value, a 96.6% (94.6%-97.8%) negative predictive value and an accuracy of 95.4% (93.6%-96.7%) for gout. In the validation set on CPPD, accuracy was 94.1% (90.2%-97.6%). The LLM framework performed well over a wide range of parameter values.
LLMs accurately detected disease diagnoses from EHRs, even in non-English languages. They could facilitate creating large disease registers in any language, improving disease care assessment and patient recruitment for clinical trials.
我们提出并测试了一个框架,该框架使用最新的大语言模型(LLM),即Meta的Llama - 3 - 8B,来检测法语电子健康记录(EHR)文档中的疾病诊断。具体而言,它专注于检测痛风(法语为“goutte”),这是一个常见的法语术语,除了表示疾病外还有多种含义。该研究将基于大语言模型的框架的性能与传统自然语言处理技术进行了比较,并测试了其对所使用参数的依赖性。
该框架是使用一个训练集和测试集开发的,该训练集和测试集包含从瑞士日内瓦一家三级大学医院随机选取的EHR文档中评估“痛风”的700个段落。所有段落均由两名医疗保健专业人员进行人工审核,并分类为疾病(真正的痛风)和非疾病(金标准)。使用少样本和思维链提示测试了大语言模型的准确性,并与基于正则表达式(regex)的方法进行了比较,重点关注模型参数和提示结构的影响。该框架在评估“焦磷酸钙沉积病(CPPD)”的600个段落上进一步得到了验证。
基于大语言模型的算法优于正则表达式方法,痛风检测的阳性预测值为92.7%(88.7% - 95.4%),阴性预测值为96.6%(94.6% - 97.8%),准确率为95.4%(93.6% - 96.7%)。在CPPD的验证集中,准确率为94.1%(90.2% - 97.6%)。大语言模型框架在广泛的参数值范围内表现良好。
大语言模型能够准确地从电子健康记录中检测疾病诊断,即使是在非英语语言中。它们可以促进以任何语言创建大型疾病登记册,改善疾病护理评估以及临床试验的患者招募。