文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

大语言模型生成的对月骨缺血性坏死常见问题解答的回复准确性高但可读性有限。

High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease.

机构信息

School of Medicine, Department of Orthopaedics and Traumatology, Division of Hand Surgery, University of Mersin, Mersin, 33110, Turkey.

School of Medicine, Department of Orthopedics and Traumatology, Ömer Halisdemir University, Niğde, Turkey.

出版信息

BMC Musculoskelet Disord. 2024 Nov 4;25(1):879. doi: 10.1186/s12891-024-07983-0.


DOI:10.1186/s12891-024-07983-0
PMID:39497130
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11536837/
Abstract

BACKGROUND: This study aimed to assess the quality and readability of large language model-generated responses to frequently asked questions (FAQs) about Kienböck's disease (KD). METHODS: Nineteen FAQs about KD were selected, and the questions were divided into three categories: general knowledge, diagnosis, and treatment. The questions were inputted into the Chat Generative Pre-trained Transformer 4 (ChatGPT4) webpage using the zero-shot prompting method, and the responses were recorded. Hand surgeons with at least 5 years of experience and advanced English proficiency were individually contacted over instant WhatsApp messaging and requested to assess the responses. The quality of each response was analyzed by 33 experienced hand surgeons using the Global Quality Scale (GQS). The readability was assessed with the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). RESULTS: The mean GQS score was 4.28 out of a maximum of 5 points. Most raters assessed the quality as good (270 of 627 responses; 43.1%) or excellent (260 of 627 responses; 41.5%). The mean FKGL was 15.5, and the mean FRES was 23.4, both of which are considered above the college graduate level. No statistically significant differences were found in the quality and readability of responses provided for questions related to general knowledge, diagnosis, and treatment. CONCLUSIONS: ChatGPT-4 provided high-quality responses to FAQs about KD. However, the primary drawback was the poor readability of these responses. By improving the readability of ChatGPT's output, we can transform it into a valuable information resource for individuals with KD. LEVEL OF EVIDENCE: Level IV, Observational study.

摘要

背景:本研究旨在评估大型语言模型生成的关于月骨骨软骨病(KD)常见问题(FAQ)解答的质量和可读性。

方法:选择了 19 个关于 KD 的 FAQ,问题分为三个类别:一般知识、诊断和治疗。使用零样本提示方法将问题输入 ChatGPT4 网页,记录回复。联系了至少有 5 年经验和高级英语水平的手外科医生,通过即时 WhatsApp 消息请求他们单独评估回复。33 名经验丰富的手外科医生使用全球质量量表(GQS)分析每个回复的质量。使用 Flesch-Kincaid 年级水平(FKGL)和 Flesch 阅读舒适度得分(FRES)评估可读性。

结果:GQS 的平均得分为 5 分制中的 4.28 分。大多数评分者将质量评估为良好(270/627 次回复;43.1%)或优秀(260/627 次回复;41.5%)。平均 FKGL 为 15.5,平均 FRES 为 23.4,均高于大学毕业水平。在与一般知识、诊断和治疗相关的问题提供的回复的质量和可读性方面,未发现统计学差异。

结论:ChatGPT-4 对 KD 的 FAQ 提供了高质量的回复。然而,主要缺点是这些回复的可读性差。通过提高 ChatGPT 输出的可读性,我们可以将其转化为 KD 患者的有价值的信息资源。

证据水平:IV 级,观察性研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6444/11536837/21120c3b73cd/12891_2024_7983_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6444/11536837/21120c3b73cd/12891_2024_7983_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6444/11536837/21120c3b73cd/12891_2024_7983_Fig1_HTML.jpg

相似文献

[1]
High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck's disease.

BMC Musculoskelet Disord. 2024-11-4

[2]
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024-8-28

[3]
Information Quality and Readability: ChatGPT's Responses to the Most Common Questions About Spinal Cord Injury.

World Neurosurg. 2024-1

[4]
Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery.

Int J Pediatr Otorhinolaryngol. 2024-6

[5]
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Rom J Ophthalmol. 2024

[6]
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.

Semin Ophthalmol. 2024-8

[7]
Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy.

Sci Rep. 2024-1-2

[8]
Readability analysis of ChatGPT's responses on lung cancer.

Sci Rep. 2024-7-26

[9]
Ensuring Accuracy and Equity in Vaccination Information From ChatGPT and CDC: Mixed-Methods Cross-Language Evaluation.

JMIR Form Res. 2024-10-30

[10]
Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability.

Arch Osteoporos. 2024-3-19

引用本文的文献

[1]
Appropriateness of Thyroid Nodule Cancer Risk Assessment and Management Recommendations Provided by Large Language Models.

J Imaging Inform Med. 2025-3-3

本文引用的文献

[1]
Large Language Model Prompting Techniques for Advancement in Clinical Medicine.

J Clin Med. 2024-8-28

[2]
Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery.

Knee. 2024-12

[3]
Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.

Digit Health. 2024-8-28

[4]
Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration.

Healthcare (Basel). 2023-10-20

[5]
Artificial intelligence and increasing misinformation.

Br J Psychiatry. 2024-2

[6]
Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial.

J Med Internet Res. 2023-10-4

[7]
Evaluation of Online Artificial Intelligence-Generated Information on Common Hand Procedures.

J Hand Surg Am. 2023-11

[8]
Evaluating ChatGPT responses on obstructive sleep apnea for patient education.

J Clin Sleep Med. 2023-12-1

[9]
Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty.

J Bone Joint Surg Am. 2023-10-4

[10]
Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.

J Med Internet Res. 2023-6-30

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索