• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能聊天机器人在超声检查中的表现:横断面比较分析。

Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis.

作者信息

Zhang Yong, Lu Xiao, Luo Yan, Zhu Ying, Ling Wenwu

机构信息

Department of Medical Ultrasound, West China Hospital of Sichuan University, 37 Guoxue Alley, Chengdu, 610041, China, 86 18980605569.

Department of Thoracic Surgery, West China Hospital of Sichuan University, Chengdu, China.

出版信息

JMIR Med Inform. 2025 Jan 9;13:e63924. doi: 10.2196/63924.

DOI:10.2196/63924
PMID:39814698
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11737282/
Abstract

BACKGROUND

Artificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic.

OBJECTIVE

This study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers.

METHODS

We curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel.

RESULTS

Of the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P<.05). Both models showed a performance decline in English, but ERNIE Bot's decline was less significant. The models performed better in terms of basic knowledge, ultrasound methods, and diseases than in terms of ultrasound signs and diagnosis.

CONCLUSIONS

Chatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use.

摘要

背景

人工智能聊天机器人越来越多地用于医学咨询,尤其是在超声医学领域。然而,它们的表现各不相同,并受到语言、问题类型和主题等因素的影响。

目的

本研究旨在评估ChatGPT和文心一言在回答超声相关医学检查问题方面的表现,为用户和开发者提供见解。

方法

我们从超声医学检查中挑选了554个问题,涵盖各种问题类型和主题。问题以英文和中文提出。客观题根据准确率评分,而主观题由5名经验丰富的医生使用李克特量表进行评分。数据在Excel中进行分析。

结果

在本研究纳入的554个问题中,单项选择题占比最大(354/554,64%),其次是简答题(69/554,12%)和名词解释(63/554,11%)。客观题的准确率在8.33%至80%之间,是非题得分最高。主观题的可接受率在47.62%至75.36%之间。文心一言在许多方面优于ChatGPT(P<0.05)。两种模型在英文方面的表现均有所下降,但文心一言的下降幅度较小。模型在基础知识、超声方法和疾病方面的表现优于超声征象和诊断方面。

结论

聊天机器人可以提供有价值的超声相关答案,但不同模型的表现存在差异,并受到语言、问题类型和主题的影响。总体而言,文心一言的表现优于ChatGPT。用户和开发者应了解模型的性能特点,并针对不同的问题和语言选择合适的模型,以优化聊天机器人的使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df4c/11737282/240658ff8da4/medinform-v13-e63924-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df4c/11737282/c606312081de/medinform-v13-e63924-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df4c/11737282/240658ff8da4/medinform-v13-e63924-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df4c/11737282/c606312081de/medinform-v13-e63924-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df4c/11737282/240658ff8da4/medinform-v13-e63924-g002.jpg

相似文献

1
Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis.人工智能聊天机器人在超声检查中的表现:横断面比较分析。
JMIR Med Inform. 2025 Jan 9;13:e63924. doi: 10.2196/63924.
2
Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.中文自闭症患者网络问诊中,医生与大型语言模型聊天机器人回复的对比分析:横断面研究。
J Med Internet Res. 2024 Apr 30;26:e54706. doi: 10.2196/54706.
3
Comparing the performance of ChatGPT and ERNIE Bot in answering questions regarding liver cancer interventional radiology in Chinese and English contexts: A comparative study.比较ChatGPT和文心一言在中英文语境下回答肝癌介入放射学相关问题的性能:一项比较研究。
Digit Health. 2025 Jan 23;11:20552076251315511. doi: 10.1177/20552076251315511. eCollection 2025 Jan-Dec.
4
Application value of generative artificial intelligence in the field of stomatology.生成式人工智能在口腔医学领域的应用价值。
Hua Xi Kou Qiang Yi Xue Za Zhi. 2024 Dec 1;42(6):810-815. doi: 10.7518/hxkq.2024.2024144.
5
The performance of ChatGPT and ERNIE Bot in surgical resident examinations.ChatGPT和文心一言在外科住院医师考试中的表现。
Int J Med Inform. 2025 Aug;200:105906. doi: 10.1016/j.ijmedinf.2025.105906. Epub 2025 Apr 4.
6
Assessing the performance of large language models (LLMs) in answering medical questions regarding breast cancer in the Chinese context.评估大语言模型(LLMs)在中国背景下回答有关乳腺癌医学问题的表现。
Digit Health. 2024 Oct 7;10:20552076241284771. doi: 10.1177/20552076241284771. eCollection 2024 Jan-Dec.
7
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
8
Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome.人工智能聊天机器人在回答医疗专业人员和护理人员有关德雷维特综合征问题时的性能比较评估。
Epilepsia Open. 2025 Apr 1. doi: 10.1002/epi4.70022.
9
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
10
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.

本文引用的文献

1
Exploring the ability of ChatGPT to create quality patient education resources about kidney transplant.探讨 ChatGPT 创建高质量的肾移植患者教育资源的能力。
Patient Educ Couns. 2024 Dec;129:108400. doi: 10.1016/j.pec.2024.108400. Epub 2024 Aug 12.
2
Monitoring Patients with Glioblastoma by Using a Large Language Model: Accurate Summarization of Radiology Reports with GPT-4.使用大语言模型监测胶质母细胞瘤患者:利用GPT-4对放射学报告进行准确总结
Radiology. 2024 Jul;312(1):e232640. doi: 10.1148/radiol.232640.
3
Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study.
评估 ChatGPT 与标准医学资源在经内镜袖状胃切除术教育中的作用:一项医学专业人员评估研究。
Obes Surg. 2024 Jul;34(7):2718-2724. doi: 10.1007/s11695-024-07283-5. Epub 2024 May 17.
4
Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.中文自闭症患者网络问诊中,医生与大型语言模型聊天机器人回复的对比分析:横断面研究。
J Med Internet Res. 2024 Apr 30;26:e54706. doi: 10.2196/54706.
5
Language and cultural bias in AI: comparing the performance of large language models developed in different countries on Traditional Chinese Medicine highlights the need for localized models.人工智能中的语言和文化偏见:比较不同国家开发的大型语言模型在中医方面的表现凸显了本地化模型的必要性。
J Transl Med. 2024 Mar 29;22(1):319. doi: 10.1186/s12967-024-05128-4.
6
Evaluation of ChatGPT for Patient Counseling in Kidney Stone Clinic: A Prospective Study.评价 ChatGPT 在肾结石诊所中对患者咨询的应用:一项前瞻性研究。
J Endourol. 2024 Apr;38(4):377-383. doi: 10.1089/end.2023.0571. Epub 2024 Feb 27.
7
Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training.利用 ChatGPT 和 GPT-4 评估西班牙专科医学培训准入考试的风湿病学问题。
Sci Rep. 2023 Dec 13;13(1):22129. doi: 10.1038/s41598-023-49483-6.
8
Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment.评估ChatGPT-4在英国医学执照评估中的表现。
Front Med (Lausanne). 2023 Sep 19;10:1240915. doi: 10.3389/fmed.2023.1240915. eCollection 2023.
9
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。
Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.
10
A comprehensive evaluation of ChatGPT consultation quality for augmentation mammoplasty: A comparative analysis between plastic surgeons and laypersons.隆胸手术中ChatGPT咨询质量的综合评估:整形外科医生与外行人的比较分析
Int J Med Inform. 2023 Nov;179:105219. doi: 10.1016/j.ijmedinf.2023.105219. Epub 2023 Sep 20.