Yu Chang, Fan Jianhua, Chen Yu, Shen Weihua, Zhang Yini, Li Ling, Wen Jiasheng, Chen Xiaoli
Department of Infection Management, Kunshan Integrated Traditional Chinese and Western Medicine Hospital, Suzhou, Jiangsu, China.
Department of Cardiology, Kunshan Hospital of Traditional Chinese Medicine, Suzhou, Jiangsu, China.
Medicine (Baltimore). 2026 Jan 30;105(5):e47493. doi: 10.1097/MD.0000000000047493.
This study aimed to compare the effectiveness of 2 artificial intelligence (AI) models, ChatGPT 4o and DeepSeek, in responding to questions about infections associated with cardiovascular implantable electronic devices (CIED). The focus was on evaluating their accuracy and readability, which are critical for their use in clinical settings. A comparative analysis was conducted using 30 questions based on American Heart Association's guidelines for CIED-related infections. Each question was asked to both AI models under 2 conditions: once without additional context and once with guideline-based prompts. Accuracy was assessed using a 4-level grading scale by 2 cardiovascular experts. Readability was measured using the Flesch-Kincaid Grade score and word-count metrics. Without guideline prompts, ChatGPT 4o provided comprehensive answers for 24 out of 30 questions (80.00%), with 5 correct but incomplete answers (16.67%) and one partially correct answer (3.33%). DeepSeek also provided comprehensive answers for 24 questions (80.00%) but had 6 correct but incomplete answers (20.00%). With guideline prompts, ChatGPT 4o's comprehensive answer rate increased to 93.33% (28/30), while DeepSeek's rate rose to 90.00% (27/30). No significant difference in overall accuracy was found (P = .34). In terms of readability, ChatGPT 4o had a higher word count (859.10 ± 235.90) compared to DeepSeek (526.27 ± 100.45), with a statistically significant difference (P <.01). The Flesch-Kincaid Grade Score for ChatGPT 4o (15.40 ± 1.18) was higher than that of DeepSeek's (13.91 ± 1.42), indicating more complex responses (P <.01). With guidelines, both models showed reduced verbosity, with ChatGPT 4o's word-count dropping to (624.00 ± 249.01) and DeepSeek's to (549.43 ± 117.40); however, this change was not statistically significant (P = .13). Similarly, slight improvements in readability with guidelines were observed for both models, but these were not statistically significant (P = .11). Both AI models demonstrated the ability to provide accurate and clinically relevant information for managing CIED infections. The use of guideline-based prompts significantly improved the completeness of their responses. ChatGPT 4o provided more detailed answers, while DeepSeek produced more concise, potentially easier-to-understand outputs.
本研究旨在比较两种人工智能(AI)模型ChatGPT 4o和豆包在回答与心血管植入式电子设备(CIED)相关感染问题方面的有效性。重点是评估它们的准确性和可读性,这对其在临床环境中的应用至关重要。根据美国心脏协会关于CIED相关感染的指南,使用30个问题进行了对比分析。在两种情况下,分别向两个AI模型提出每个问题:一次无额外背景信息,一次有基于指南的提示。由两位心血管专家使用4级评分量表评估准确性。使用弗莱施-金凯德年级水平得分和单词计数指标来衡量可读性。在没有指南提示的情况下,ChatGPT 4o对30个问题中的24个提供了全面答案(80.00%),5个答案正确但不完整(16.67%),1个部分正确答案(3.33%)。豆包也对24个问题提供了全面答案(80.00%),但有6个答案正确但不完整(20.00%)。有指南提示时,ChatGPT 4o的全面答案率提高到93.33%(28/30),而豆包的全面答案率提高到90.00%(27/30)。未发现总体准确性有显著差异(P = 0.34)。在可读性方面,ChatGPT 4o的单词计数(859.10±235.90)高于豆包(526.27±100.45),差异具有统计学意义(P < 0.01)。ChatGPT 4o的弗莱施-金凯德年级水平得分(15.40±1.18)高于豆包(13.91±1.42),表明其回答更复杂(P < 0.01)。有指南时,两个模型的表述都有所简化,ChatGPT 4o的单词计数降至(624.00±249.01),豆包降至(549.43±117.40);然而,这种变化无统计学意义(P = 0.13)。同样,两个模型在有指南时可读性都略有提高,但无统计学意义(P = 0.11)。两种AI模型都展示了为管理CIED感染提供准确且与临床相关信息的能力。使用基于指南的提示显著提高了它们回答的完整性。ChatGPT 4o提供了更详细的答案,而豆包产生了更简洁、可能更易于理解的输出。