• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在轴性脊柱关节炎管理中的性能评估:对欧洲抗风湿病联盟2022年建议的分析

Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations.

作者信息

Usen Ahmet, Kuculmez Ozlem

机构信息

Department of Physical Medicine and Rehabilitation, Medipol University, Istanbul 34810, Turkey.

Department of Physical Medicine and Rehabilitation, Baskent University Alanya Hospital, Antalya 07400, Turkey.

出版信息

Diagnostics (Basel). 2025 Jun 7;15(12):1455. doi: 10.3390/diagnostics15121455.

DOI:10.3390/diagnostics15121455
PMID:40564776
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12192445/
Abstract

: Guidelines have great importance in revealing complex and chronic conditions such as axial spondyloarthropathy. The aim of this study is to compare the answers given by various large language models to open-ended questions created from ASAS-EULAR 2022 guidance. : This was a cross-sectional and comparative study. A total of 15 recommendations in the ASAS-EULAR 2022 guideline were derived directly from their content into open-ended questions in a clinical context. Each question was asked to the ChatGPT-3.5, GPT-4o, and Gemini 2.0 Flash models, and the answers were evaluated with a seven-point Likert system in terms of usability, reliability, Flesch-Kincaid Reading Ease (FKRE) and Flesch-Kincaid Grade Level (FKGL) metrics for readability, Universal Sentence Encoder (USE) and ROUGE-L for semantic and surface-level similarity. The results of different large language models were statistically compared, and < 0.05 was revealed as statistically significant. : Better FKRE and FKGL scores were obtained in the Google Gemini 2.0 program ( < 0.05). Reliability and usefulness scores were significantly higher for ChatGPT-4o and Gemini 2.0 ( < 0.05). ChatGPT-4o yielded significantly higher semantic similarity scores compared to ChatGPT-3.5 ( < 0.05). There was a negative correlation between FKRE and FKGL scores and a positive correlation between reliability and usefulness scores ( < 0.05). : It was determined that ChatGPT-4o and Gemini 2.0 programs gave more reliable, useful, and readable answers to open-ended questions derived from the ASAS-EULAR 2022 guidelines. These programs may potentially assist in supporting treatment decision-making in rheumatology in the future.

摘要

指南在揭示诸如中轴型脊柱关节炎等复杂和慢性疾病方面具有重要意义。本研究的目的是比较各种大语言模型对根据2022年ASAS - EULAR指南提出的开放式问题的回答。:这是一项横断面比较研究。2022年ASAS - EULAR指南中的15项建议直接从其内容转化为临床背景下的开放式问题。每个问题都向ChatGPT - 3.5、GPT - 4o和Gemini 2.0 Flash模型提出,并根据可用性、可靠性、弗莱什 - 金凯德易读性(FKRE)和弗莱什 - 金凯德年级水平(FKGL)指标对答案进行七点李克特系统评估以衡量可读性,使用通用句子编码器(USE)和ROUGE - L评估语义和表面级相似性。对不同大语言模型的结果进行统计学比较,P < 0.05被视为具有统计学意义。:谷歌Gemini 2.0程序获得了更好的FKRE和FKGL分数(P < 0.05)。ChatGPT - 4o和Gemini 2.0的可靠性和有用性分数显著更高(P < 0.05)。与ChatGPT - 3.5相比,ChatGPT - 4o产生的语义相似性分数显著更高(P < 0.05)。FKRE和FKGL分数之间存在负相关,可靠性和有用性分数之间存在正相关(P < 0.05)。:确定ChatGPT - 4o和Gemini 2.0程序对源自2022年ASAS - EULAR指南的开放式问题给出了更可靠、有用和易读的答案。这些程序未来可能有助于支持风湿病学中的治疗决策。

相似文献

1
Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations.大型语言模型在轴性脊柱关节炎管理中的性能评估:对欧洲抗风湿病联盟2022年建议的分析
Diagnostics (Basel). 2025 Jun 7;15(12):1455. doi: 10.3390/diagnostics15121455.
2
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
3
Currently Available Large Language Models Are Moderately Effective in Improving Readability of English and Spanish Patient Education Materials in Pediatric Orthopaedics.目前可用的大语言模型在提高儿科骨科英语和西班牙语患者教育材料的可读性方面有一定效果。
J Am Acad Orthop Surg. 2025 Jun 24. doi: 10.5435/JAAOS-D-25-00267.
4
A Cross-Sectional Comparison of Patient Information Guides Generated by ChatGPT Versus Google Gemini for Alzheimer's Disease, Parkinsonism, and Migraine.ChatGPT与谷歌Gemini生成的针对阿尔茨海默病、帕金森症和偏头痛的患者信息指南的横断面比较
Cureus. 2025 May 20;17(5):e84507. doi: 10.7759/cureus.84507. eCollection 2025 May.
5
Bridging Health Literacy Gaps in Spine Care: Using ChatGPT-4o to Improve Patient-Education Materials.弥合脊柱护理中的健康素养差距:利用ChatGPT-4o改进患者教育材料。
J Bone Joint Surg Am. 2025 Jun 19. doi: 10.2106/JBJS.24.01484.
6
Artificial Intelligence Shows Limited Success in Improving Readability Levels of Spanish-language Orthopaedic Patient Education Materials.人工智能在提高西班牙语骨科患者教育材料的可读性方面成效有限。
Clin Orthop Relat Res. 2025 Feb 11. doi: 10.1097/CORR.0000000000003413.
7
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
8
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
9
Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson's disease: An artificial intelligence approach.丘脑底核或苍白球内侧部深部脑刺激治疗帕金森病:一种人工智能方法。
J Clin Neurosci. 2025 Jun 18;138:111393. doi: 10.1016/j.jocn.2025.111393.
10
What is the value of routinely testing full blood count, electrolytes and urea, and pulmonary function tests before elective surgery in patients with no apparent clinical indication and in subgroups of patients with common comorbidities: a systematic review of the clinical and cost-effective literature.在没有明显临床指征的患者和常见合并症患者亚组中,在择期手术前常规检测全血细胞计数、电解质和尿素以及肺功能测试的价值:对临床和成本效益文献的系统评价。
Health Technol Assess. 2012 Dec;16(50):i-xvi, 1-159. doi: 10.3310/hta16500.

本文引用的文献

1
Evaluating the performance of large language models in health education for patients with ankylosing spondylitis/spondyloarthritis: a cross-sectional, single-blind study in China.评估大语言模型在强直性脊柱炎/脊柱关节炎患者健康教育中的表现:一项在中国进行的横断面单盲研究。
BMJ Open. 2025 Mar 21;15(3):e097528. doi: 10.1136/bmjopen-2024-097528.
2
Ankylosing spondylitis: From pathogenesis to therapy.强直性脊柱炎:从发病机制到治疗
Int Immunopharmacol. 2025 Jan 3;145:113709. doi: 10.1016/j.intimp.2024.113709. Epub 2024 Dec 6.
3
An Actual Insight into the Pathogenic Pathways of Ankylosing Spondylitis.对强直性脊柱炎致病途径的实际洞察。
Curr Issues Mol Biol. 2024 Nov 11;46(11):12800-12812. doi: 10.3390/cimb46110762.
4
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o: correspondence.人工智能模型在风湿病学委员会水平问题上的比较表现:评估谷歌Gemini和ChatGPT-4o:通信
Clin Rheumatol. 2024 Dec;43(12):4015-4016. doi: 10.1007/s10067-024-07176-z. Epub 2024 Oct 10.
5
Perceptions of ChatGPT in healthcare: usefulness, trust, and risk.医疗保健领域对 ChatGPT 的认知:实用性、信任度和风险。
Front Public Health. 2024 Sep 13;12:1457131. doi: 10.3389/fpubh.2024.1457131. eCollection 2024.
6
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
7
Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.大型语言模型的性能比较分析:ChatGPT-3.5、ChatGPT-4 和 Google Gemini 在糖皮质激素诱导性骨质疏松症中的表现。
J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2.
8
Environmental and Genetic Determinants of Ankylosing Spondylitis.环境与遗传因素与强直性脊柱炎的相关性。
Int J Mol Sci. 2024 Jul 17;25(14):7814. doi: 10.3390/ijms25147814.
9
The Use of TNF-α Inhibitors in Active Ankylosing Spondylitis Treatment.肿瘤坏死因子-α抑制剂在活动性强直性脊柱炎治疗中的应用
Cureus. 2024 Jun 1;16(6):e61500. doi: 10.7759/cureus.61500. eCollection 2024 Jun.
10
What does artificial intelligence mean in rheumatology?人工智能在风湿病学中意味着什么?
Arch Rheumatol. 2024 Feb 12;39(1):1-9. doi: 10.46497/ArchRheumatol.2024.10664. eCollection 2024 Mar.