• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估聊天机器人回复作为常见PET-CT检查患者信息资源的可靠性和可读性。

Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.

作者信息

Aydinbelge-Dizdar N, Dizdar K

机构信息

Department of Nuclear Medicine, Ankara Etlik City Hospital, Ankara, Turkiye.

Department of Software Engineering, ASELSAN Inc., Ankara, Turkiye.

出版信息

Rev Esp Med Nucl Imagen Mol (Engl Ed). 2025 Jan-Feb;44(1):500065. doi: 10.1016/j.remnie.2024.500065. Epub 2024 Sep 28.

DOI:10.1016/j.remnie.2024.500065
PMID:39349172
Abstract

PURPOSE

This study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, 'ChatGPT-4.0' and 'Google Gemini', to potential patient questions about PET/CT scans.

MATERIALS AND METHODS

Thirty potential questions for each of [F]FDG and [Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed.

RESULTS

The median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2-4], 3 [3-4], 3 [3-4] for ChatPT-4 and 4 [2-5], 4 [2-5], 3.5 [3-5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3-5], 3 [3-4], 3 [2-3] for ChatGPT-4, and 4 [3-5], 4 [3-5], 4 [3-5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2-4], 2 [2-4], 3 [2-4] for ChatGPT-4, and 3 [2-5], 3 [1-5], 3 [2-5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32-0,812), 0.707 (95% CI = 0.458-0.853) and 0.738 (95% CI = 0.519-0.866), respectively (p < 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677-0.910), 0.881 (95% CI = 0.78-0.94) and 0.847 (95% CI = 0.719-0.922), respectively (p < 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p < 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p < 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses.

CONCLUSION

There was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.

摘要

目的

本研究旨在评估两款流行的人工智能聊天机器人“ChatGPT-4.0”和“谷歌Gemini”针对潜在患者关于PET/CT扫描问题所给出回答的可靠性和可读性。

材料与方法

2024年5月,分别向ChatGPT-4和Gemini提出了针对[F]FDG和[Ga]Ga-DOTA-SSTR PET/CT的30个潜在问题,以及针对[Ga]Ga-PSMA PET/CT的29个潜在问题。使用改良的DISCERN(mDISCERN)量表、弗莱什易读性(FRE)、冈宁雾度指数(GFI)和弗莱什-金凯德阅读年级水平(FKRGL)对回答进行可靠性和可读性评估。评估了三位评分者(ChatGPT-4、Gemini和一位核医学医生)对回答给出的mDISCERN分数的评分者间信度。

结果

医生对ChatGPT-4关于FDG、PSMA和DOTA PET/CT扫描回答的mDISCERN分数中位数[最小值-最大值]分别为3.5[2-4]、3[3-4]、3[3-4],对Gemini的分别为4[2-5]、4[2-5]、3.5[3-5]。ChatGPT-4对关于FDG、PSMA和DOTA-SSTR PET/CT扫描答案的mDISCERN分数分别为3.5[3-5]、3[3-4]、3[2-3],Gemini的分别为4[3-5]、4[3-5]、4[3-5]。Gemini对关于FDG、PSMA和DOTA-SSTR PET/CT扫描回答的mDISCERN分数分别为3[2-4]、2[2-4]、3[2-4],ChatGPT-4的分别为3[2-5]、3[1-5]、3[2-5]。ChatGPT-4关于FDG、PSMA和DOTA-SSTR PET/CT扫描回答的mDISCERN分数的评分者间信度相关系数分别为0.629(95%CI = 0.32-0.812)、0.707(95%CI = 0.458-0.853)和0.738(95%CI = 0.519-0.866)(p < 0.001)。Gemini关于FDG、PSMA和DOTA-SSTR PET/CT扫描回答的mDISCERN分数的相关系数分别为0.824(95%CI = 0.677-0.910)、0.881(95%CI = 0.78-0.94)和0.847(95%CI = 0.719-0.922)(p < 0.001)。ChatGPT-4、Gemini和医生评估的mDISCERN分数表明,根据评分者间信度相关系数,聊天机器人关于所有PET/CT扫描的回答具有中等至良好的统计学一致性(p < 0.001)。ChatGPT-4和Gemini关于PET/CT扫描回答的所有可读性分数(FKRGL、GFI和FRE)存在统计学显著差异(p < 0.001)。Gemini的回答更短,可读性分数比ChatGPT-4的回答更好。

结论

评分者对mDISCERN分数的一致性水平可接受,表明对回答的整体可靠性达成了一致。然而,公众难以轻松读懂人工智能聊天机器人提供的信息。

相似文献

1
Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.评估聊天机器人回复作为常见PET-CT检查患者信息资源的可靠性和可读性。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2025 Jan-Feb;44(1):500065. doi: 10.1016/j.remnie.2024.500065. Epub 2024 Sep 28.
2
Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients.ChatGPT-4 和 Google Bard 作为癌症患者最常用放射性核素治疗的患者信息来源的可靠性和可读性分析。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2024 Jul-Aug;43(4):500021. doi: 10.1016/j.remnie.2024.500021. Epub 2024 May 29.
3
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
4
Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。
Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.
5
Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现:一项比较研究。
Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.
6
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。
PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.
7
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
8
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
9
ChatGPT-4o's performance on pediatric Vesicoureteral reflux.ChatGPT-4o在小儿膀胱输尿管反流方面的表现。
J Pediatr Urol. 2025 Apr;21(2):504-509. doi: 10.1016/j.jpurol.2024.12.002. Epub 2024 Dec 7.
10
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量:一项观察性研究。
Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.