大型语言模型在回答结核病医学问题方面的能力：对ChatGPT、Gemini和Copilot进行测试

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.

作者信息

Dastani Meisam, Mardaneh Jalal, Rostamian Morteza

机构信息

Infectious Diseases Research Center, Gonabad University of Medical Sciences, Gonabad, Iran.

Department of Microbiology, Infectious Diseases Research Center, School of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran.

出版信息

Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.

DOI:10.1038/s41598-025-03074-9

PMID:40410343

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12102205/

Abstract

This study aims to evaluate the capability of Large Language Models (LLMs) in responding to questions related to tuberculosis. Three large language models (ChatGPT, Gemini, and Copilot) were selected based on public accessibility criteria and their ability to respond to medical questions. Questions were designed across four main domains (diagnosis, treatment, prevention and control, and disease management). The responses were subsequently evaluated using DISCERN-AI and NLAT-AI assessment tools. ChatGPT achieved higher scores (4 out of 5) across all domains, while Gemini demonstrated superior performance in specific areas such as prevention and control with a score of 4.4. Copilot showed the weakest performance in disease management with a score of 3.6. In the diagnosis domain, all three models demonstrated equivalent performance (4 out of 5). According to the DISCERN-AI criteria, ChatGPT excelled in information relevance but showed deficiencies in providing sources and information production dates. All three models exhibited similar performance in balance and objectivity indicators. While all three models demonstrate acceptable capabilities in responding to medical questions related to tuberculosis, they share common limitations such as insufficient source citation and failure to acknowledge response uncertainties. Enhancement of these models could strengthen their role in providing medical information.

摘要

本研究旨在评估大语言模型（LLMs）回答与结核病相关问题的能力。基于公共可及性标准及其回答医学问题的能力，选择了三个大语言模型（ChatGPT、Gemini和Copilot）。问题围绕四个主要领域设计（诊断、治疗、预防与控制以及疾病管理）。随后使用DISCERN-AI和NLAT-AI评估工具对回答进行评估。ChatGPT在所有领域均获得较高分数（5分制下得4分），而Gemini在预防与控制等特定领域表现出色，得分为4.4。Copilot在疾病管理方面表现最弱，得分为3.6。在诊断领域，三个模型表现相当（5分制下得4分）。根据DISCERN-AI标准，ChatGPT在信息相关性方面表现出色，但在提供来源和信息生成日期方面存在不足。在平衡性和客观性指标方面，三个模型表现相似。虽然这三个模型在回答与结核病相关的医学问题时都展现出了可接受的能力，但它们都存在共同的局限性，如引用来源不足以及未承认回答的不确定性。改进这些模型可以增强它们在提供医学信息方面的作用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大型语言模型在回答结核病医学问题方面的能力：对ChatGPT、Gemini和Copilot进行测试

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

大型语言模型在回答结核病医学问题方面的能力：对ChatGPT、Gemini和Copilot进行测试

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.

作者信息

机构信息

出版信息

相似文献

本文引用的文献