Suppr超能文献

大型语言模型在医疗咨询中的性能评估:比较研究

Performance Assessment of Large Language Models in Medical Consultation: Comparative Study.

作者信息

Seo Sujeong, Kim Kyuli, Yang Heyoung

机构信息

Future Technology Analysis Center, Korea Institute of Science and Technology Information, Seoul, Republic of Korea.

Postal Savings & Insurance Development Institute, Seoul, Republic of Korea.

出版信息

JMIR Med Inform. 2025 Feb 12;13:e64318. doi: 10.2196/64318.

Abstract

BACKGROUND

The recent introduction of generative artificial intelligence (AI) as an interactive consultant has sparked interest in evaluating its applicability in medical discussions and consultations, particularly within the domain of depression.

OBJECTIVE

This study evaluates the capability of large language models (LLMs) in AI to generate responses to depression-related queries.

METHODS

Using the PubMedQA and QuoraQA data sets, we compared various LLMs, including BioGPT, PMC-LLaMA, GPT-3.5, and Llama2, and measured the similarity between the generated and original answers.

RESULTS

The latest general LLMs, GPT-3.5 and Llama2, exhibited superior performance, particularly in generating responses to medical inquiries from the PubMedQA data set.

CONCLUSIONS

Considering the rapid advancements in LLM development in recent years, it is hypothesized that version upgrades of general LLMs offer greater potential for enhancing their ability to generate "knowledge text" in the biomedical domain compared with fine-tuning for the biomedical field. These findings are expected to contribute significantly to the evolution of AI-based medical counseling systems.

摘要

背景

近期生成式人工智能(AI)作为交互式咨询工具的引入,引发了人们对评估其在医学讨论和咨询中适用性的兴趣,尤其是在抑郁症领域。

目的

本研究评估人工智能中的大语言模型(LLMs)对抑郁症相关问题生成回答的能力。

方法

使用PubMedQA和QuoraQA数据集,我们比较了各种大语言模型,包括BioGPT、PMC-LLaMA、GPT-3.5和Llama2,并测量了生成答案与原始答案之间的相似度。

结果

最新的通用大语言模型GPT-3.5和Llama2表现出卓越的性能,尤其是在生成对PubMedQA数据集中医学问题的回答方面。

结论

考虑到近年来大语言模型发展的快速进步,据推测,与针对生物医学领域进行微调相比,通用大语言模型的版本升级在增强其在生物医学领域生成“知识文本”能力方面具有更大潜力。这些发现有望对基于人工智能的医学咨询系统的发展做出重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7102/11888074/3b154431e251/medinform_v13i1e64318_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验