Skyles Ty J, Freeman Isaac J, Kalibbala Georgewilliam, Davila-Garcia David, Kiser Kendall, Raju Silpa, Wilcox Adam
Brigham Young University, Provo, UT.
Knox College, Galesburg, IL.
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:518-526. eCollection 2025.
In large-scale clinical informatics, there is a need to maximize the amount of usable data from electronic health records. With the adoption of large language models in medical research, there is potential to use them to extract structured data from unstructured clinical notes. We explored how ChatGPT could be used to improve data availability in cancer research. We assessed how GPT used clinical notes to answer six relevant clinical questions. Four prompt engineering strategies were used: zero-shot, zero-shot with context, few-shot, and few-shot with context. Few-shot prompting often decreased the accuracy of GPT outputs and context did not consistently improve accuracy. GPT extracted patients' Gleason scores and ages with an F1 score of 0.99 and it identified if patients received palliative care with and if patients were in pain with an F1 score of 0.86. Effective use of LLMs has potential to increase interoperability between healthcare and clinical research.
在大规模临床信息学中,需要最大限度地从电子健康记录中获取可用数据。随着大语言模型在医学研究中的应用,利用它们从非结构化临床笔记中提取结构化数据具有潜力。我们探索了如何使用ChatGPT来提高癌症研究中的数据可用性。我们评估了GPT如何利用临床笔记回答六个相关临床问题。使用了四种提示工程策略:零样本、带上下文的零样本、少样本和带上下文的少样本。少样本提示通常会降低GPT输出的准确性,并且上下文并不能始终提高准确性。GPT提取患者的 Gleason 评分和年龄的 F1 分数为 0.99,它识别患者是否接受姑息治疗以及患者是否疼痛的 F1 分数为 0.86。有效使用大语言模型有潜力提高医疗保健和临床研究之间的互操作性。