人工智能平台的比较分析：ChatGPT-3.5和GoogleBard在识别腰痛警示信号方面的应用

Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain.

作者信息

Yilmaz Muluk Selkin, Olcucu Nazli

机构信息

Physical Medicine and Rehabilitation, Antalya City Hospital, Antalya, TUR.

Physical Medicine and Rehabilitation, Antalya Ataturk State Hospital, Antalya, TUR.

出版信息

Cureus. 2024 Jul 1;16(7):e63580. doi: 10.7759/cureus.63580. eCollection 2024 Jul.

DOI:10.7759/cureus.63580

PMID:39087174

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11290316/

Abstract

BACKGROUND

Low back pain (LBP) is a prevalent healthcare concern that is frequently responsive to conservative treatment. However, it can also stem from severe conditions, marked by 'red flags' (RF) such as malignancy, cauda equina syndrome, fractures, infections, spondyloarthropathies, and aneurysm rupture, which physicians should be vigilant about. Given the increasing reliance on online health information, this study assessed ChatGPT-3.5's (OpenAI, San Francisco, CA, USA) and GoogleBard's (Google, Mountain View, CA, USA) accuracy in responding to RF-related LBP questions and their capacity to discriminate the severity of the condition.

METHODS

We created 70 questions on RF-related symptoms and diseases following the LBP guidelines. Among them, 58 had a single symptom (SS), and 12 had multiple symptoms (MS) of LBP. Questions were posed to ChatGPT and GoogleBard, and responses were assessed by two authors for accuracy, completeness, and relevance (ACR) using a 5-point rubric criteria.

RESULTS

Cohen's kappa values (0.60-0.81) indicated significant agreement among the authors. The average scores for responses ranged from 3.47 to 3.85 for ChatGPT-3.5 and from 3.36 to 3.76 for GoogleBard for 58 SS questions, and from 4.04 to 4.29 for ChatGPT-3.5 and from 3.50 to 3.71 for GoogleBard for 12 MS questions. The ratings for these responses ranged from 'good' to 'excellent'. Most SS responses effectively conveyed the severity of the situation (93.1% for ChatGPT-3.5, 94.8% for GoogleBard), and all MS responses did so. No statistically significant differences were found between ChatGPT-3.5 and GoogleBard scores (p>0.05).

CONCLUSIONS

In an era characterized by widespread online health information seeking, artificial intelligence (AI) systems play a vital role in delivering precise medical information. These technologies may hold promise in the field of health information if they continue to improve.

摘要

背景

腰痛（LBP）是一个普遍存在的医疗问题，通常对保守治疗有反应。然而，它也可能源于严重疾病，以“红旗征”（RF）为标志，如恶性肿瘤、马尾综合征、骨折、感染、脊柱关节病和动脉瘤破裂，医生对此应保持警惕。鉴于对在线健康信息的依赖日益增加，本研究评估了ChatGPT-3.5（美国加利福尼亚州旧金山OpenAI公司）和谷歌巴德（美国加利福尼亚州山景城谷歌公司）在回答与RF相关的腰痛问题时的准确性及其区分病情严重程度的能力。

方法

我们根据腰痛指南创建了70个关于RF相关症状和疾病的问题。其中，58个问题有单一症状（SS），12个问题有多症状（MS）的腰痛。向ChatGPT和谷歌巴德提出问题，由两位作者使用5分制评分标准对回答的准确性、完整性和相关性（ACR）进行评估。

结果

科恩kappa值（0.60 - 0.81）表明作者之间存在显著一致性。对于58个SS问题，ChatGPT-3.5的回答平均得分在3.47至3.85之间，谷歌巴德的回答平均得分在3.36至3.76之间；对于12个MS问题，ChatGPT-3.5的回答平均得分在4.04至4.29之间，谷歌巴德的回答平均得分在3.50至3.71之间。这些回答的评分从“良好”到“优秀”不等。大多数SS回答有效地传达了病情的严重程度（ChatGPT-3.5为93.1%，谷歌巴德为94.8%），所有MS回答均如此。ChatGPT-3.5和谷歌巴德的得分之间未发现统计学显著差异（p>0.05）。

结论

在一个以广泛寻求在线健康信息为特征的时代，人工智能（AI）系统在提供精确的医疗信息方面发挥着至关重要的作用。如果这些技术持续改进，它们在健康信息领域可能会有前景。

相似文献

Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain.

Cureus. 2024 Jul 1;16(7):e63580. doi: 10.7759/cureus.63580. eCollection 2024 Jul.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases.

Cureus. 2024 Jul 25;16(7):e65372. doi: 10.7759/cureus.65372. eCollection 2024 Jul.

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.

Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.

Generative Artificial Intelligence in Patient Education: ChatGPT Takes on Hypertension Questions.

Cureus. 2024 Feb 2;16(2):e53441. doi: 10.7759/cureus.53441. eCollection 2024 Feb.

ChatGPT Output Regarding Compulsory Vaccination and COVID-19 Vaccine Conspiracy: A Descriptive Study at the Outset of a Paradigm Shift in Online Search for Information.

Cureus. 2023 Feb 15;15(2):e35029. doi: 10.7759/cureus.35029. eCollection 2023 Feb.

Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini.

Cureus. 2024 May 8;16(5):e59898. doi: 10.7759/cureus.59898. eCollection 2024 May.

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.

Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.

Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy.

Aesthetic Plast Surg. 2025 Apr;49(7):1868-1873. doi: 10.1007/s00266-024-04343-0. Epub 2024 Sep 16.

Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study.

Cureus. 2023 Nov 8;15(11):e48518. doi: 10.7759/cureus.48518. eCollection 2023 Nov.

引用本文的文献

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.

Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.

BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.

Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.

BMC Infect Dis. 2024 Aug 8;24(1):799. doi: 10.1186/s12879-024-09725-y.

本文引用的文献

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review.

Interact J Med Res. 2024 Feb 15;13:e54704. doi: 10.2196/54704.

ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios.

Cureus. 2023 Dec 16;15(12):e50629. doi: 10.7759/cureus.50629. eCollection 2023 Dec.

Online patient education in body contouring: A comparison between Google and ChatGPT.

J Plast Reconstr Aesthet Surg. 2023 Dec;87:390-402. doi: 10.1016/j.bjps.2023.10.091. Epub 2023 Oct 20.

Analyzing the Performance of ChatGPT About Osteoporosis.

Cureus. 2023 Sep 25;15(9):e45890. doi: 10.7759/cureus.45890. eCollection 2023 Sep.

Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians.

Am J Med. 2023 Nov;136(11):1119-1123.e18. doi: 10.1016/j.amjmed.2023.08.003. Epub 2023 Aug 27.

Evaluating the performance of ChatGPT in answering questions related to pediatric urology.

J Pediatr Urol. 2024 Feb;20(1):26.e1-26.e5. doi: 10.1016/j.jpurol.2023.08.003. Epub 2023 Aug 7.

How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.

Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.

Evaluation of ChatGPT's Capabilities in Medical Report Generation.

Cureus. 2023 Apr 14;15(4):e37589. doi: 10.7759/cureus.37589. eCollection 2023 Apr.

The exciting potential for ChatGPT in obstetrics and gynecology.

Am J Obstet Gynecol. 2023 Jun;228(6):696-705. doi: 10.1016/j.ajog.2023.03.009. Epub 2023 Mar 15.

Role of Chat GPT in Public Health.

Ann Biomed Eng. 2023 May;51(5):868-869. doi: 10.1007/s10439-023-03172-7. Epub 2023 Mar 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能平台的比较分析：ChatGPT-3.5和GoogleBard在识别腰痛警示信号方面的应用

Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献