• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在回答结核病医学问题方面的能力:对ChatGPT、Gemini和Copilot进行测试

Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.

作者信息

Dastani Meisam, Mardaneh Jalal, Rostamian Morteza

机构信息

Infectious Diseases Research Center, Gonabad University of Medical Sciences, Gonabad, Iran.

Department of Microbiology, Infectious Diseases Research Center, School of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran.

出版信息

Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.

DOI:10.1038/s41598-025-03074-9
PMID:40410343
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12102205/
Abstract

This study aims to evaluate the capability of Large Language Models (LLMs) in responding to questions related to tuberculosis. Three large language models (ChatGPT, Gemini, and Copilot) were selected based on public accessibility criteria and their ability to respond to medical questions. Questions were designed across four main domains (diagnosis, treatment, prevention and control, and disease management). The responses were subsequently evaluated using DISCERN-AI and NLAT-AI assessment tools. ChatGPT achieved higher scores (4 out of 5) across all domains, while Gemini demonstrated superior performance in specific areas such as prevention and control with a score of 4.4. Copilot showed the weakest performance in disease management with a score of 3.6. In the diagnosis domain, all three models demonstrated equivalent performance (4 out of 5). According to the DISCERN-AI criteria, ChatGPT excelled in information relevance but showed deficiencies in providing sources and information production dates. All three models exhibited similar performance in balance and objectivity indicators. While all three models demonstrate acceptable capabilities in responding to medical questions related to tuberculosis, they share common limitations such as insufficient source citation and failure to acknowledge response uncertainties. Enhancement of these models could strengthen their role in providing medical information.

摘要

本研究旨在评估大语言模型(LLMs)回答与结核病相关问题的能力。基于公共可及性标准及其回答医学问题的能力,选择了三个大语言模型(ChatGPT、Gemini和Copilot)。问题围绕四个主要领域设计(诊断、治疗、预防与控制以及疾病管理)。随后使用DISCERN-AI和NLAT-AI评估工具对回答进行评估。ChatGPT在所有领域均获得较高分数(5分制下得4分),而Gemini在预防与控制等特定领域表现出色,得分为4.4。Copilot在疾病管理方面表现最弱,得分为3.6。在诊断领域,三个模型表现相当(5分制下得4分)。根据DISCERN-AI标准,ChatGPT在信息相关性方面表现出色,但在提供来源和信息生成日期方面存在不足。在平衡性和客观性指标方面,三个模型表现相似。虽然这三个模型在回答与结核病相关的医学问题时都展现出了可接受的能力,但它们都存在共同的局限性,如引用来源不足以及未承认回答的不确定性。改进这些模型可以增强它们在提供医学信息方面的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fbd/12102205/b450d89a7766/41598_2025_3074_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fbd/12102205/b450d89a7766/41598_2025_3074_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fbd/12102205/b450d89a7766/41598_2025_3074_Fig1_HTML.jpg

相似文献

1
Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.大型语言模型在回答结核病医学问题方面的能力:对ChatGPT、Gemini和Copilot进行测试
Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.
2
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
3
Comparison of ChatGPT-4o, Google Gemini 1.5 Pro, Microsoft Copilot Pro, and Ophthalmologists in the management of uveitis and ocular inflammation: A comparative study of large language models.ChatGPT-4o、谷歌Gemini 1.5 Pro、微软Copilot Pro与眼科医生在葡萄膜炎和眼部炎症管理中的比较:大型语言模型的对比研究
J Fr Ophtalmol. 2025 Apr;48(4):104468. doi: 10.1016/j.jfo.2025.104468. Epub 2025 Mar 13.
4
Evaluating the reliability of the responses of large language models to keratoconus-related questions.评估大语言模型对圆锥角膜相关问题回答的可靠性。
Clin Exp Optom. 2024 Oct 24:1-8. doi: 10.1080/08164622.2024.2419524.
5
Can American Board of Surgery in Training Examinations be passed by Large Language Models? Comparative assessment of Gemini, Copilot, and ChatGPT.大型语言模型能通过美国外科医师委员会培训考试吗?Gemini、Copilot和ChatGPT的比较评估。
Am Surg. 2025 May 12:31348251341956. doi: 10.1177/00031348251341956.
6
Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?大语言模型能否为家长提供关于慢性肾脏病的准确、高质量信息?
J Eval Clin Pract. 2024 Dec;30(8):1556-1564. doi: 10.1111/jep.14084. Epub 2024 Jul 3.
7
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
8
Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus.使用ChatGPT-4.0、谷歌Gemini和微软Copilot评估圆锥角膜相关问题的回答:大型语言模型在圆锥角膜方面的比较研究
Eye Contact Lens. 2025 Mar 1;51(3):e107-e111. doi: 10.1097/ICL.0000000000001158. Epub 2024 Dec 4.
9
Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对早产儿视网膜病变常见问题的回答:一项关于可读性和适宜性的研究
J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.
10
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.

本文引用的文献

1
ChatGPT as a patient education tool in colorectal cancer-An in-depth assessment of efficacy, quality and readability.ChatGPT作为结直肠癌患者教育工具——疗效、质量和可读性的深入评估
Colorectal Dis. 2025 Jan;27(1):e17267. doi: 10.1111/codi.17267. Epub 2024 Dec 17.
2
Comparison of Large Language Models in Diagnosis and Management of Challenging Clinical Cases.大语言模型在疑难临床病例诊断与管理中的比较
Clin Ophthalmol. 2024 Nov 12;18:3239-3247. doi: 10.2147/OPTH.S488232. eCollection 2024.
3
Testing AI on language comprehension tasks reveals insensitivity to underlying meaning.
在语言理解任务上测试 AI 会暴露出其对潜在含义的不敏感。
Sci Rep. 2024 Nov 14;14(1):28083. doi: 10.1038/s41598-024-79531-8.
4
A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity.大语言模型在圆锥角膜中的性能评估:ChatGPT-3.5、ChatGPT-4.0、Gemini、Copilot、Chatsonic和Perplexity的比较研究
J Clin Med. 2024 Oct 30;13(21):6512. doi: 10.3390/jcm13216512.
5
Comparing the dental knowledge of large language models.比较大语言模型的牙科知识。
Br Dent J. 2024 Oct 31. doi: 10.1038/s41415-024-8015-2.
6
Radiologic Decision-Making for Imaging in Pulmonary Embolism: Accuracy and Reliability of Large Language Models-Bing, Claude, ChatGPT, and Perplexity.肺栓塞影像学检查的放射学决策:大语言模型——必应、克劳德、ChatGPT和Perplexity的准确性与可靠性
Indian J Radiol Imaging. 2024 Jul 4;34(4):653-660. doi: 10.1055/s-0044-1787974. eCollection 2024 Oct.
7
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
8
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
9
GPT is an effective tool for multilingual psychological text analysis.GPT 是一种用于多语言心理文本分析的有效工具。
Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2308950121. doi: 10.1073/pnas.2308950121. Epub 2024 Aug 12.
10
A chatbot based question and answer system for the auxiliary diagnosis of chronic diseases based on large language model.基于大语言模型的慢性病辅助诊断的聊天机器人问答系统。
Sci Rep. 2024 Jul 25;14(1):17118. doi: 10.1038/s41598-024-67429-4.