Uppalapati Vamsi Krishna, Nag Deb Sanjay
Department of Anesthesiology, Tata Main Hospital, Jamshedpur, IND.
Cureus. 2024 Jan 18;16(1):e52485. doi: 10.7759/cureus.52485. eCollection 2024 Jan.
This study rigorously evaluates the performance of four artificial intelligence (AI) language models - ChatGPT, Claude AI, Google Bard, and Perplexity AI - across four key metrics: accuracy, relevance, clarity, and completeness. We used a strong mix of research methods, getting opinions from 14 scenarios. This helped us make sure our findings were accurate and dependable. The study showed that Claude AI performs better than others because it gives complete responses. Its average score was 3.64 for relevance and 3.43 for completeness compared to other AI tools. ChatGPT always did well, and Google Bard had unclear responses, which varied greatly, making it difficult to understand it, so there was no consistency in Google Bard. These results give important information about what AI language models are doing well or not for medical suggestions. They help us use them better, telling us how to improve future tech changes that use AI. The study shows that AI abilities match complex medical scenarios.
本研究严格评估了四种人工智能(AI)语言模型——ChatGPT、Claude AI、谷歌巴德(Google Bard)和Perplexity AI——在四个关键指标上的表现:准确性、相关性、清晰度和完整性。我们采用了多种研究方法,从14个场景中获取意见。这有助于确保我们的研究结果准确可靠。研究表明,Claude AI表现优于其他模型,因为它给出的回答完整。与其他人工智能工具相比,其相关性平均得分为3.64,完整性平均得分为3.43。ChatGPT一直表现出色,而谷歌巴德的回答不清晰,差异很大,难以理解,因此谷歌巴德缺乏一致性。这些结果提供了关于人工智能语言模型在提供医学建议方面表现优劣的重要信息。它们有助于我们更好地使用这些模型,告诉我们如何改进未来使用人工智能的技术变革。研究表明,人工智能的能力与复杂的医疗场景相匹配。
Am J Orthod Dentofacial Orthop. 2024-6
Future Sci OA. 2025-12
Diagnostics (Basel). 2025-3-10
J Pers Med. 2024-12-21
Indian J Endocrinol Metab. 2022
Curr Opin Pulm Med. 2022-9-1
J Am Med Inform Assoc. 2022-3-15
Scand J Trauma Resusc Emerg Med. 2020-8-20
J Oral Maxillofac Surg. 2020-8
Anesth Prog. 2019
Ann Thorac Surg. 2018-8-31