Suppr超能文献

评估Gemini Advanced、Gemini和Bard生成的鉴别诊断列表准确性的比较研究:用于病例报告系列分析的横断面研究。

Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study.

作者信息

Hirosawa Takanobu, Harada Yukinori, Tokumasu Kazuki, Ito Takahiro, Suzuki Tomoharu, Shimizu Taro

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga, Japan.

Department of General Medicine, Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Japan.

出版信息

JMIR Med Inform. 2024 Oct 2;12:e63010. doi: 10.2196/63010.

Abstract

BACKGROUND

Generative artificial intelligence (GAI) systems by Google have recently been updated from Bard to Gemini and Gemini Advanced as of December 2023. Gemini is a basic, free-to-use model after a user's login, while Gemini Advanced operates on a more advanced model requiring a fee-based subscription. These systems have the potential to enhance medical diagnostics. However, the impact of these updates on comprehensive diagnostic accuracy remains unknown.

OBJECTIVE

This study aimed to compare the accuracy of the differential diagnosis lists generated by Gemini Advanced, Gemini, and Bard across comprehensive medical fields using case report series.

METHODS

We identified a case report series with relevant final diagnoses published in the American Journal Case Reports from January 2022 to March 2023. After excluding nondiagnostic cases and patients aged 10 years and younger, we included the remaining case reports. After refining the case parts as case descriptions, we input the same case descriptions into Gemini Advanced, Gemini, and Bard to generate the top 10 differential diagnosis lists. In total, 2 expert physicians independently evaluated whether the final diagnosis was included in the lists and its ranking. Any discrepancies were resolved by another expert physician. Bonferroni correction was applied to adjust the P values for the number of comparisons among 3 GAI systems, setting the corrected significance level at P value <.02.

RESULTS

In total, 392 case reports were included. The inclusion rates of the final diagnosis within the top 10 differential diagnosis lists were 73% (286/392) for Gemini Advanced, 76.5% (300/392) for Gemini, and 68.6% (269/392) for Bard. The top diagnoses matched the final diagnoses in 31.6% (124/392) for Gemini Advanced, 42.6% (167/392) for Gemini, and 31.4% (123/392) for Bard. Gemini demonstrated higher diagnostic accuracy than Bard both within the top 10 differential diagnosis lists (P=.02) and as the top diagnosis (P=.001). In addition, Gemini Advanced achieved significantly lower accuracy than Gemini in identifying the most probable diagnosis (P=.002).

CONCLUSIONS

The results of this study suggest that Gemini outperformed Bard in diagnostic accuracy following the model update. However, Gemini Advanced requires further refinement to optimize its performance for future artificial intelligence-enhanced diagnostics. These findings should be interpreted cautiously and considered primarily for research purposes, as these GAI systems have not been adjusted for medical diagnostics nor approved for clinical use.

摘要

背景

谷歌的生成式人工智能(GAI)系统最近已从Bard更新为Gemini,并于2023年12月更新为Gemini Advanced。Gemini是用户登录后可免费使用的基础模型,而Gemini Advanced运行在更先进的模型上,需要付费订阅。这些系统有增强医学诊断的潜力。然而,这些更新对综合诊断准确性的影响仍不明确。

目的

本研究旨在使用病例报告系列比较Gemini Advanced、Gemini和Bard在综合医学领域生成的鉴别诊断列表的准确性。

方法

我们确定了一个2022年1月至2023年3月发表在美国《病例报告杂志》上的具有相关最终诊断的病例报告系列。在排除非诊断性病例和10岁及以下患者后,我们纳入了其余的病例报告。将病例部分提炼为病例描述后,我们将相同的病例描述输入Gemini Advanced、Gemini和Bard,以生成前10名的鉴别诊断列表。共有2名专家医生独立评估最终诊断是否包含在列表中及其排名。任何差异均由另一名专家医生解决。采用Bonferroni校正来调整3个GAI系统之间比较次数的P值,将校正后的显著性水平设定为P值<.02。

结果

总共纳入了392份病例报告。Gemini Advanced在前10名鉴别诊断列表中的最终诊断纳入率为73%(286/392),Gemini为76.5%(300/392),Bard为68.6%(269/392)。Gemini Advanced的首要诊断与最终诊断匹配的比例为31.6%(124/392),Gemini为42.6%(167/392),Bard为31.4%(123/392)。在10名鉴别诊断列表中(P =.02)以及作为首要诊断时(P =.001),Gemini的诊断准确性均高于Bard。此外,在识别最可能的诊断方面,Gemini Advanced的准确性显著低于Gemini(P =.002)。

结论

本研究结果表明,在模型更新后,Gemini在诊断准确性方面优于Bard。然而,Gemini Advanced需要进一步优化,以提升其在未来人工智能增强诊断中的性能。这些发现应谨慎解读,主要供研究使用,因为这些GAI系统尚未针对医学诊断进行调整,也未获批准用于临床。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c9d/11483254/cad522a35646/medinform_v12i1e63010_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验