评估 ChatGPT 回答原发性震颤常见问题的能力。

Assessing ChatGPT Ability to Answer Frequently Asked Questions About Essential Tremor.

机构信息

Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", Neuroscience Section, University of Salerno, Via Allende 43, 84081 Baronissi, SA, Italy.

Department of Neurology, "Umberto I"Hospital, Nocera Inferiore (SA), Italy.

出版信息

Tremor Other Hyperkinet Mov (N Y). 2024 Jul 3;14:33. doi: 10.5334/tohm.917. eCollection 2024.

DOI:10.5334/tohm.917

PMID:38973820

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11225576/

Abstract

BACKGROUND

Large-language models (LLMs) driven by artificial intelligence allow people to engage in direct conversations about their health. The accuracy and readability of the answers provided by ChatGPT, the most famous LLM, about Essential Tremor (ET), one of the commonest movement disorders, have not yet been evaluated.

METHODS

Answers given by ChatGPT to 10 questions about ET were evaluated by 5 professionals and 15 laypeople with a score ranging from 1 (poor) to 5 (excellent) in terms of clarity, relevance, accuracy (only for professionals), comprehensiveness, and overall value of the response. We further calculated the readability of the answers.

RESULTS

ChatGPT answers received relatively positive evaluations, with median scores ranging between 4 and 5, by both groups and independently from the type of question. However, there was only moderate agreement between raters, especially in the group of professionals. Moreover, readability levels were poor for all examined answers.

DISCUSSION

ChatGPT provided relatively accurate and relevant answers, with some variability as judged by the group of professionals suggesting that the degree of literacy about ET has influenced the ratings and, indirectly, that the quality of information provided in clinical practice is also variable. Moreover, the readability of the answer provided by ChatGPT was found to be poor. LLMs will likely play a significant role in the future; therefore, health-related content generated by these tools should be monitored.

摘要

背景

人工智能驱动的大型语言模型（LLM）允许人们就其健康问题进行直接对话。ChatGPT 是最著名的 LLM 之一，关于最常见的运动障碍之一——特发性震颤（ET）的回答的准确性和可读性尚未得到评估。

方法

由 5 名专业人员和 15 名非专业人员对 ChatGPT 对 10 个关于 ET 的问题的回答进行评估，评分范围为 1（差）到 5（优），分别评估清晰度、相关性、准确性（仅针对专业人员）、全面性和回复的整体价值。我们进一步计算了回答的可读性。

结果

ChatGPT 的回答得到了相对积极的评价，两组人员和独立于问题类型的评分中位数均在 4 到 5 之间。然而，评分者之间的一致性较差，尤其是在专业人员组中。此外，所有检查的答案的可读性都较差。

讨论

ChatGPT 提供了相对准确和相关的答案，但一些由专业人员判断的答案存在差异，这表明对 ET 的了解程度会影响评分，间接地表明临床实践中提供的信息质量也存在差异。此外，ChatGPT 提供的答案的可读性较差。LLM 很可能在未来发挥重要作用；因此，应该对这些工具生成的与健康相关的内容进行监测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5376/11225576/7a1d0450541b/tohm-14-1-917-g1.jpg

相似文献

Assessing ChatGPT Ability to Answer Frequently Asked Questions About Essential Tremor.

Tremor Other Hyperkinet Mov (N Y). 2024 Jul 3;14:33. doi: 10.5334/tohm.917. eCollection 2024.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

Prescription of Controlled Substances: Benefits and Risks

Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.

JMIR Form Res. 2025 Aug 13;9:e73642. doi: 10.2196/73642.

Systemic treatments for metastatic cutaneous melanoma.

Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.

J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

ChatGPT vs. neurologists: a cross-sectional study investigating preference, satisfaction ratings and perceived empathy in responses among people living with multiple sclerosis.

J Neurol. 2024 Jul;271(7):4057-4066. doi: 10.1007/s00415-024-12328-x. Epub 2024 Apr 3.

A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.

Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.

Testing ChatGPT ability to answer laypeople questions about cardiac arrest and cardiopulmonary resuscitation.

Resuscitation. 2024 Jan;194:110077. doi: 10.1016/j.resuscitation.2023.110077. Epub 2023 Dec 9.

The future landscape of large language models in medicine.

Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.

Accuracy and Reliability of Chatbot Responses to Physician Questions.

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

Stigma and Social Avoidance in Adults with Essential Tremor.

Mov Disord Clin Pract. 2023 Jun 21;10(9):1317-1323. doi: 10.1002/mdc3.13774. eCollection 2023 Sep.

Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.

EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.

Large language models encode clinical knowledge.

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.

J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.

How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.

Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估 ChatGPT 回答原发性震颤常见问题的能力。

Assessing ChatGPT Ability to Answer Frequently Asked Questions About Essential Tremor.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献