评估 ChatGPT 提供恰当的骨折预防建议和医学科学问题回答的能力：一项定量研究。

Evaluation of ChatGPT in providing appropriate fracture prevention recommendations and medical science question responses: A quantitative research.

机构信息

Department of Orthopaedics, Xiangya Hospital, Central South University, #87 Xiangya Road, Changsha, Hunan, China.

Department of Neurology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China.

出版信息

Medicine (Baltimore). 2024 Mar 15;103(11):e37458. doi: 10.1097/MD.0000000000037458.

DOI:10.1097/MD.0000000000037458

PMID:38489735

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10939678/

Abstract

Currently, there are limited studies assessing ChatGPT ability to provide appropriate responses to medical questions. Our study aims to evaluate ChatGPT adequacy in responding to questions regarding osteoporotic fracture prevention and medical science. We created a list of 25 questions based on the guidelines and our clinical experience. Additionally, we included 11 medical science questions from the journal Science. Three patients, 3 non-medical professionals, 3 specialist doctor and 3 scientists were involved to evaluate the accuracy and appropriateness of responses by ChatGPT3.5 on October 2, 2023. To simulate a consultation, an inquirer (either a patient or non-medical professional) would send their questions to a consultant (specialist doctor or scientist) via a website. The consultant would forward the questions to ChatGPT for answers, which would then be evaluated for accuracy and appropriateness by the consultant before being sent back to the inquirer via the website for further review. The primary outcome is the appropriate, inappropriate, and unreliable rate of ChatGPT responses as evaluated separately by the inquirer and consultant groups. Compared to orthopedic clinicians, the patients' rating on the appropriateness of ChatGPT responses to the questions about osteoporotic fracture prevention was slightly higher, although the difference was not statistically significant (88% vs 80%, P = .70). For medical science questions, non-medical professionals and medical scientists rated similarly. In addition, the experts' ratings on the appropriateness of ChatGPT responses to osteoporotic fracture prevention and to medical science questions were comparable. On the other hand, the patients perceived that the appropriateness of ChatGPT responses to osteoporotic fracture prevention questions was slightly higher than that to medical science questions (88% vs 72·7%, P = .34). ChatGPT is capable of providing comparable and appropriate responses to medical science questions, as well as to fracture prevention related issues. Both the inquirers seeking advice and the consultants providing advice recognize ChatGPT expertise in these areas.

摘要

目前，评估 ChatGPT 提供医学问题适当回答能力的研究有限。我们的研究旨在评估 ChatGPT 在回答与骨质疏松性骨折预防和医学科学相关问题时的充分性。我们根据指南和我们的临床经验创建了一个包含 25 个问题的列表。此外，我们还从《科学》杂志中包含了 11 个医学科学问题。2023 年 10 月 2 日，我们邀请了 3 名患者、3 名非医学专业人士、3 名专科医生和 3 名科学家来评估 ChatGPT3.5 回答的准确性和适当性。为了模拟咨询，咨询者（患者或非医学专业人士）将通过网站向顾问（专科医生或科学家）发送他们的问题。顾问将问题转发给 ChatGPT 以获取答案，然后由顾问评估答案的准确性和适当性，再通过网站将答案发回给咨询者以供进一步审查。主要结果是咨询者和顾问组分别评估的 ChatGPT 回答的适当性、不适当性和不可靠率。与骨科临床医生相比，患者对 ChatGPT 回答关于骨质疏松性骨折预防问题的适当性的评分略高，尽管差异无统计学意义（88%比 80%，P=0.70）。对于医学科学问题，非医学专业人士和医学科学家的评分相似。此外，专家对 ChatGPT 回答骨质疏松性骨折预防和医学科学问题的适当性的评分相当。另一方面，患者认为 ChatGPT 回答骨质疏松性骨折预防问题的适当性略高于回答医学科学问题的适当性（88%比 72.7%，P=0.34）。ChatGPT 能够提供与医学科学问题以及与骨折预防相关问题相当的合适和适当的回答。寻求建议的咨询者和提供建议的顾问都认可 ChatGPT 在这些领域的专业知识。

相似文献

Evaluation of ChatGPT in providing appropriate fracture prevention recommendations and medical science question responses: A quantitative research.评估 ChatGPT 提供恰当的骨折预防建议和医学科学问题回答的能力：一项定量研究。

Medicine (Baltimore). 2024 Mar 15;103(11):e37458. doi: 10.1097/MD.0000000000037458.

Artificial intelligence versus orthopedic surgeons as an orthopedic consultant in the emergency department.人工智能与骨科医生在急诊科作为骨科顾问的比较。

Injury. 2025 Apr;56(4):112297. doi: 10.1016/j.injury.2025.112297. Epub 2025 Mar 22.

Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty.全膝关节置换术后患者问题的大语言模型应用与实践

J Arthroplasty. 2024 Sep;39(9):2289-2294. doi: 10.1016/j.arth.2024.03.017. Epub 2024 Mar 13.

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比：对耳鼻喉科基于病例问题回答的盲法评估

JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.

"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.“医生 ChatGPT，你能帮我吗？”患者视角：横断面研究。

J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Generative Artificial Intelligence in Patient Education: ChatGPT Takes on Hypertension Questions.患者教育中的生成式人工智能：ChatGPT 应对高血压问题。

Cureus. 2024 Feb 2;16(2):e53441. doi: 10.7759/cureus.53441. eCollection 2024 Feb.

Assessing the Clinical Appropriateness and Practical Utility of ChatGPT as an Educational Resource for Patients Considering Minimally Invasive Spine Surgery.评估ChatGPT作为考虑微创脊柱手术患者的教育资源的临床适用性和实际效用。

Cureus. 2024 Oct 8;16(10):e71105. doi: 10.7759/cureus.71105. eCollection 2024 Oct.

Appropriateness of Frequently Asked Patient Questions Following Total Hip Arthroplasty From ChatGPT Compared to Arthroplasty-Trained Nurses.人工髋关节置换术后患者常问问题的适宜性：ChatGPT 与关节置换训练护士相比。

J Arthroplasty. 2024 Sep;39(9S1):S306-S311. doi: 10.1016/j.arth.2024.04.020. Epub 2024 Apr 16.

ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople.ChatGPT与睡眠障碍专家对常见睡眠问题的回答：专家和非专业人士的评分

Sleep Health. 2024 Dec;10(6):665-670. doi: 10.1016/j.sleh.2024.08.011. Epub 2024 Sep 21.

引用本文的文献

Evaluation and comparison of large language models' responses to questions related optic neuritis.大语言模型对与视神经炎相关问题的回答的评估与比较

Front Med (Lausanne). 2025 Jun 25;12:1516442. doi: 10.3389/fmed.2025.1516442. eCollection 2025.

The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases.人工智能在常见肌肉骨骼疾病一级预防中的作用

Cureus. 2024 Jul 25;16(7):e65372. doi: 10.7759/cureus.65372. eCollection 2024 Jul.

本文引用的文献

ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.ChatGPT 让医学文献通俗易懂：简化放射学报告的探索性案例研究。

Eur Radiol. 2024 May;34(5):2817-2825. doi: 10.1007/s00330-023-10213-1. Epub 2023 Oct 5.

Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.评估 GPT 作为放射学决策辅助工具：GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。

J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.

Analysis of large-language model versus human performance for genetics questions.大语言模型与人类在遗传学问题表现上的分析。

Eur J Hum Genet. 2024 Apr;32(4):466-468. doi: 10.1038/s41431-023-01396-8. Epub 2023 May 29.

Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.ChatGPT提供的乳腺癌预防和筛查建议的适宜性。

Radiology. 2023 May;307(4):e230424. doi: 10.1148/radiol.230424. Epub 2023 Apr 4.

ChatGPT: the future of discharge summaries?ChatGPT：出院小结的未来？

Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.

ChatGPT: friend or foe?ChatGPT：朋友还是敌人？

Lancet Digit Health. 2023 Mar;5(3):e102. doi: 10.1016/S2589-7500(23)00023-7. Epub 2023 Feb 6.

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model.从一个基于在线聊天的流行人工智能模型获取的心血管疾病预防建议的适宜性。

JAMA. 2023 Mar 14;329(10):842-844. doi: 10.1001/jama.2023.1044.

Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer's Disease Using Voice.利用语音实现的人工智能端到端阿尔茨海默病检测与评估

Brain Sci. 2022 Dec 23;13(1):28. doi: 10.3390/brainsci13010028.

UK clinical guideline for the prevention and treatment of osteoporosis.英国临床骨质疏松症预防和治疗指南。

Arch Osteoporos. 2022 Apr 5;17(1):58. doi: 10.1007/s11657-022-01061-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验