• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能如何回答常见肺癌问题:ChatGPT 与 Google Bard 对比。

How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.

机构信息

Department of Radiological Sciences, Division of Cardiothoracic Imaging, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

出版信息

Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.

DOI:10.1148/radiol.230922
PMID:37310252
Abstract

Background The recent release of large language models (LLMs) for public use, such as ChatGPT and Google Bard, has opened up a multitude of potential benefits as well as challenges. Purpose To evaluate and compare the accuracy and consistency of responses generated by publicly available ChatGPT-3.5 and Google Bard to non-expert questions related to lung cancer prevention, screening, and terminology commonly used in radiology reports based on the recommendation of Lung Imaging Reporting and Data System (Lung-RADS) v2022 from American College of Radiology and Fleischner society. Materials and Methods Forty of the exact same questions were created and presented to ChatGPT-3.5 and Google Bard experimental version as well as Bing and Google search engines by three different authors of this paper. Each answer was reviewed by two radiologists for accuracy. Responses were scored as correct, partially correct, incorrect, or unanswered. Consistency was also evaluated among the answers. Here, consistency was defined as the agreement between the three answers provided by ChatGPT-3.5, Google Bard experimental version, Bing, and Google search engines regardless of whether the concept conveyed was correct or incorrect. The accuracy among different tools were evaluated using Stata. Results ChatGPT-3.5 answered 120 questions with 85 (70.8%) correct, 14 (11.7%) partially correct, and 21 (17.5%) incorrect. Google Bard did not answer 23 (19.1%) questions. Among the 97 questions answered by Google Bard, 62 (51.7%) were correct, 11 (9.2%) were partially correct, and 24 (20%) were incorrect. Bing answered 120 questions with 74 (61.7%) correct, 13 (10.8%) partially correct, and 33 (27.5%) incorrect. Google search engine answered 120 questions with 66 (55%) correct, 27 (22.5%) partially correct, and 27 (22.5%) incorrect. The ChatGPT-3.5 is more likely to provide correct or partially answer than Google Bard, approximately by 1.5 folds (OR = 1.55, P = 0.004). ChatGPT-3.5 and Google search engine were more likely to be consistent than Google Bard by approximately 7 and 29 folds (OR = 6.65, P = 0.002 for ChatGPT and OR = 28.83, P = 0.002 for Google search engine, respectively). Conclusion Although ChatGPT-3.5 had a higher accuracy in comparison with the other tools, neither ChatGPT nor Google Bard, Bing and Google search engines answered all questions correctly and with 100% consistency.

摘要

背景 最近,大型语言模型(LLMs)如 ChatGPT 和 Google Bard 等已面向公众发布,这为我们带来了许多潜在的益处,同时也带来了诸多挑战。目的 基于美国放射学院的 Lung Imaging Reporting and Data System (Lung-RADS) v2022 和 Fleischner 学会的推荐,评估并比较公众可获取的 ChatGPT-3.5 和 Google Bard 对与肺癌预防、筛查和放射学报告中常用术语相关的非专业问题的回答的准确性和一致性,这些问题是由三位论文作者创建并呈现给 ChatGPT-3.5 和 Google Bard 实验版以及 Bing 和谷歌搜索引擎的。由两位放射科医生对每个答案进行准确性评估。答案被标记为正确、部分正确、错误或未回答。还评估了答案之间的一致性。在这里,一致性被定义为 ChatGPT-3.5、Google Bard 实验版、Bing 和谷歌搜索引擎提供的三个答案之间的一致性,无论传达的概念是否正确。使用 Stata 评估不同工具之间的准确性。结果 ChatGPT-3.5 回答了 120 个问题,其中 85 个(70.8%)正确,14 个(11.7%)部分正确,21 个(17.5%)错误。Google Bard 没有回答 23 个(19.1%)问题。在 Google Bard 回答的 97 个问题中,62 个(51.7%)正确,11 个(9.2%)部分正确,24 个(20%)错误。Bing 回答了 120 个问题,其中 74 个(61.7%)正确,13 个(10.8%)部分正确,33 个(27.5%)错误。谷歌搜索引擎回答了 120 个问题,其中 66 个(55%)正确,27 个(22.5%)部分正确,27 个(22.5%)错误。与 Google Bard 相比,ChatGPT-3.5 更有可能提供正确或部分答案,大约是其 1.5 倍(OR = 1.55,P = 0.004)。ChatGPT-3.5 和谷歌搜索引擎比 Google Bard 更有可能保持一致,分别约为 7 倍和 29 倍(ChatGPT 的 OR = 6.65,P = 0.002,谷歌搜索引擎的 OR = 28.83,P = 0.002)。结论 虽然 ChatGPT-3.5 的准确性与其他工具相比有所提高,但无论是 ChatGPT 还是 Google Bard、Bing 和谷歌搜索引擎都没有正确回答所有问题,并且一致性也没有达到 100%。

相似文献

1
How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.人工智能如何回答常见肺癌问题:ChatGPT 与 Google Bard 对比。
Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.
2
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
3
Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing.大语言模型在血液学病例解决中的应用:ChatGPT-3.5、谷歌巴德和微软必应的比较研究
Cureus. 2023 Aug 21;15(8):e43861. doi: 10.7759/cureus.43861. eCollection 2023 Aug.
4
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
5
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现:ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.
6
Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society.ChatGPT、GPT-4 和 Bard 在日本放射学会官方董事会考试中的表现评估。
Jpn J Radiol. 2024 Feb;42(2):201-207. doi: 10.1007/s11604-023-01491-2. Epub 2023 Oct 4.
7
Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.大型语言模型在回答免疫肿瘤学问题中的比较:一项横断面研究。
medRxiv. 2023 Oct 31:2023.10.31.23297825. doi: 10.1101/2023.10.31.23297825.
8
Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型(ChatGPT、必应搜索和谷歌巴德)在解决生理学病例 vignettes 中的表现。
Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.
9
Assessing the Capability of ChatGPT, Google Bard, and Microsoft Bing in Solving Radiology Case Vignettes.评估ChatGPT、谷歌巴德和微软必应解决放射学病例 vignettes的能力。
Indian J Radiol Imaging. 2023 Dec 29;34(2):276-282. doi: 10.1055/s-0043-1777746. eCollection 2024 Apr.
10
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

引用本文的文献

1
Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study.减少生成式人工智能聊天机器人提供癌症信息时的幻觉及反应权衡:开发与评估研究
JMIR Cancer. 2025 Sep 11;11:e70176. doi: 10.2196/70176.
2
A Comparative Study of Five Large Language Models' Response for Liver Cancer Comprehensive Treatment.五种大语言模型对肝癌综合治疗反应的比较研究
J Hepatocell Carcinoma. 2025 Aug 20;12:1861-1871. doi: 10.2147/JHC.S531642. eCollection 2025.
3
Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.
针对乳糜泻基本常见问题,对ChatGPT准确性和可靠性的专家评估。
Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.
4
Artificial intelligence across the cancer care continuum.贯穿癌症护理全过程的人工智能
Cancer. 2025 Aug 15;131(16):e70050. doi: 10.1002/cncr.70050.
5
Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.评估ChatGPT和DeepSeek之间的大型语言模型在双语个体哮喘教育中的作用:比较研究
JMIR Med Inform. 2025 Aug 13;13:e65365. doi: 10.2196/65365.
6
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.从临床医生和患者角度评估ChatGPT在肺癌放疗中的教育潜力:内容质量与可读性分析
JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.
7
Development and evaluation of large-language models (LLMs) for oncology: A scoping review.用于肿瘤学的大语言模型的开发与评估:一项范围综述。
PLOS Digit Health. 2025 Aug 7;4(8):e0000980. doi: 10.1371/journal.pdig.0000980. eCollection 2025 Aug.
8
Assessing Information Provided by ChatGPT: Heart Failure Versus Patent Ductus Arteriosus.评估ChatGPT提供的信息:心力衰竭与动脉导管未闭
Cureus. 2025 Jun 19;17(6):e86365. doi: 10.7759/cureus.86365. eCollection 2025 Jun.
9
Large language model integrations in cancer decision-making: a systematic review and meta-analysis.大型语言模型在癌症决策中的应用:一项系统综述和荟萃分析。
NPJ Digit Med. 2025 Jul 17;8(1):450. doi: 10.1038/s41746-025-01824-7.
10
Optimizing patient education for radioactive iodine therapy and the role of ChatGPT incorporating chain-of-thought technique: ChatGPT questionnaire.优化放射性碘治疗的患者教育以及结合思维链技术的ChatGPT的作用:ChatGPT问卷
Digit Health. 2025 Jul 7;11:20552076251357468. doi: 10.1177/20552076251357468. eCollection 2025 Jan-Dec.