Wiest Isabella Catharina, Ferber Dyke, Zhu Jiefu, van Treeck Marko, Meyer Sonja K, Juglan Radhika, Carrero Zunamys I, Paech Daniel, Kleesiek Jens, Ebert Matthias P, Truhn Daniel, Kather Jakob Nikolas
Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
NPJ Digit Med. 2024 Sep 20;7(1):257. doi: 10.1038/s41746-024-01233-2.
Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) "Llama 2" to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.
大多数临床信息都编码为自由文本,无法进行定量分析。本研究提出了一种使用本地大语言模型(LLM)“Llama 2”从临床文本中提取定量信息的开源管道,并评估其在识别失代偿期肝硬化特征方面的性能。该大语言模型以零样本和单样本方式从MIMIC IV数据集中的500份患者病历中识别出五个关键临床特征。我们比较了三种规模的大语言模型和各种提示工程方法,并将预测结果与三位盲法医学专家的真实情况进行了比较。我们的管道实现了高准确率,检测肝硬化的灵敏度为100%,特异性为96%。使用700亿参数模型检测腹水(95%,95%)、意识模糊(76%,94%)、腹痛(84%,97%)和呼吸急促(87%,97%)时也产生了高灵敏度和特异性,该模型优于较小版本。我们的研究成功证明了本地部署的大语言模型在硬件要求较低的情况下从自由文本中提取临床信息的能力。