人工智能在结膜炎研究中的应用：通过幻觉率分析评估ChatGPT和百川智能在病因、干预措施及引用完整性方面的表现

AI in conjunctivitis research: assessing ChatGPT and DeepSeek for etiology, intervention, and citation integrity via hallucination rate analysis.

作者信息

Hasnain Muhammad, Aurangzeb Khursheed, Alhussein Musaed, Ghani Imran, Mahmood Muhammad Hamza

机构信息

Department of Computer Science, Lahore Leads University, Lahore, Pakistan.

Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

Front Artif Intell. 2025 Aug 20;8:1579375. doi: 10.3389/frai.2025.1579375. eCollection 2025.

DOI:10.3389/frai.2025.1579375

PMID:40910118

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12405273/

Abstract

INTRODUCTION

The advent of large language models and their applications have gained significant attention due to their strengths in natural language processing.

METHODS

In this study, ChatGPT and DeepSeek are utilized as AI models to assist in diagnosis based on the responses generated to clinical questions. Furthermore, ChatGPT, Claude, and DeepSeek are used to analyze images to assess their potential diagnostic capabilities, applying the various sensitivity analyses described. We employ prompt engineering techniques and evaluate their abilities to generate high quality responses. We propose several prompts and use them to answer important information on conjunctivitis.

RESULTS

Our findings show that DeepSeek excels in offering precise and comprehensive information on specific topics related to conjunctivitis. DeepSeek provides detailed explanations and in depth medical insights. In contrast, the ChatGPT model provides generalized public information on the infection, which makes it more suitable for broader and less technical discussions. In this study, DeepSeek achieved a better performance with a 7% hallucination rate compared to ChatGPT's 13%. Claude demonstrated perfect 100% accuracy in binary classification, significantly outperforming ChatGPT's 62.5% accuracy.

DISCUSSION

DeepSeek showed limited performance in understanding images dataset on conjunctivitis. This comparative analysis serves as an insightful reference for scholars and health professionals applying these models in varying medical contexts.

摘要

引言

大语言模型的出现及其应用因其在自然语言处理方面的优势而备受关注。

方法

在本研究中，ChatGPT和DeepSeek被用作人工智能模型，根据对临床问题的回答来辅助诊断。此外，ChatGPT、Claude和DeepSeek被用于分析图像，以评估它们的潜在诊断能力，并应用所描述的各种敏感性分析。我们采用提示工程技术并评估它们生成高质量回答的能力。我们提出了几个提示，并使用它们来回答关于结膜炎的重要信息。

结果

我们的研究结果表明，DeepSeek在提供与结膜炎相关的特定主题的精确和全面信息方面表现出色。DeepSeek提供详细的解释和深入的医学见解。相比之下，ChatGPT模型提供关于该感染的一般性公共信息，这使其更适合进行更广泛且技术含量较低的讨论。在本研究中，DeepSeek的幻觉率为7%，表现优于ChatGPT的13%。Claude在二分类中表现出100%的完美准确率，显著优于ChatGPT的62.5%的准确率。