University Hospitals Birmingham NHS Foundation Trust, Solihull, UK.
University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
J Eur Acad Dermatol Venereol. 2024 Dec;38(12):2235-2239. doi: 10.1111/jdv.20237. Epub 2024 Jul 12.
Artificial intelligence (AI) tools have the potential to revolutionize many facets of medicine and medical sciences research. Numerous AI tools have been developed and are in continuous states of iterative improvement in their functionality.
This study aimed to assess the performance of three AI tools: The Literature, Microsoft's Copilot and Google's Gemini in performing literature reviews on a range of dermatology topics.
Each tool was asked to write a literature review on five topics. The topics chosen have recently had peer-reviewed systematic reviews published. The outputs of each took were graded on their evidence and analysis, conclusions and references on a 5-point Likert scale by three dermatologists who are working in clinical practice, have completed the UK dermatology postgraduate training examination and are partaking in continued professional development.
Across all five topics chosen, the literature reviews written by Gemini scored the highest. The mean score for Gemini for each review was 10.53, significantly higher than the mean scores achieved by The Literature (7.73) and Copilot (7.4) (p < 0.001).
This paper shows that AI-generated literature reviews can provide real-time summaries of medical literature across a range of dermatology topics, but limitations to their comprehensiveness and accuracy are apparent.
人工智能(AI)工具有可能彻底改变医学和医学科学研究的许多方面。已经开发出许多 AI 工具,并且它们的功能正在不断迭代改进。
本研究旨在评估三款 AI 工具(The Literature、Microsoft 的 Copilot 和 Google 的 Gemini)在皮肤科多个主题的文献综述方面的性能。
要求每个工具撰写五个主题的文献综述。选择的主题最近有经过同行评审的系统评价发表。由三位正在临床工作、已完成英国皮肤科研究生培训考试并参与持续专业发展的皮肤科医生,根据证据和分析、结论和参考文献对每个工具的输出进行五级李克特量表评分。
在所选择的五个主题中,由 Gemini 撰写的文献综述得分最高。对于每个综述,Gemini 的平均得分为 10.53,明显高于 The Literature(7.73)和 Copilot(7.4)的平均得分(p<0.001)。
本文表明,AI 生成的文献综述可以实时总结皮肤科多个主题的医学文献,但它们的全面性和准确性存在明显的局限性。