Department of Orthopaedics, Xiangya Hospital, Central South University, #87 Xiangya Road, Changsha, Hunan, China.
Department of Neurology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China.
Medicine (Baltimore). 2024 Mar 15;103(11):e37458. doi: 10.1097/MD.0000000000037458.
Currently, there are limited studies assessing ChatGPT ability to provide appropriate responses to medical questions. Our study aims to evaluate ChatGPT adequacy in responding to questions regarding osteoporotic fracture prevention and medical science. We created a list of 25 questions based on the guidelines and our clinical experience. Additionally, we included 11 medical science questions from the journal Science. Three patients, 3 non-medical professionals, 3 specialist doctor and 3 scientists were involved to evaluate the accuracy and appropriateness of responses by ChatGPT3.5 on October 2, 2023. To simulate a consultation, an inquirer (either a patient or non-medical professional) would send their questions to a consultant (specialist doctor or scientist) via a website. The consultant would forward the questions to ChatGPT for answers, which would then be evaluated for accuracy and appropriateness by the consultant before being sent back to the inquirer via the website for further review. The primary outcome is the appropriate, inappropriate, and unreliable rate of ChatGPT responses as evaluated separately by the inquirer and consultant groups. Compared to orthopedic clinicians, the patients' rating on the appropriateness of ChatGPT responses to the questions about osteoporotic fracture prevention was slightly higher, although the difference was not statistically significant (88% vs 80%, P = .70). For medical science questions, non-medical professionals and medical scientists rated similarly. In addition, the experts' ratings on the appropriateness of ChatGPT responses to osteoporotic fracture prevention and to medical science questions were comparable. On the other hand, the patients perceived that the appropriateness of ChatGPT responses to osteoporotic fracture prevention questions was slightly higher than that to medical science questions (88% vs 72·7%, P = .34). ChatGPT is capable of providing comparable and appropriate responses to medical science questions, as well as to fracture prevention related issues. Both the inquirers seeking advice and the consultants providing advice recognize ChatGPT expertise in these areas.
目前,评估 ChatGPT 提供医学问题适当回答能力的研究有限。我们的研究旨在评估 ChatGPT 在回答与骨质疏松性骨折预防和医学科学相关问题时的充分性。我们根据指南和我们的临床经验创建了一个包含 25 个问题的列表。此外,我们还从《科学》杂志中包含了 11 个医学科学问题。2023 年 10 月 2 日,我们邀请了 3 名患者、3 名非医学专业人士、3 名专科医生和 3 名科学家来评估 ChatGPT3.5 回答的准确性和适当性。为了模拟咨询,咨询者(患者或非医学专业人士)将通过网站向顾问(专科医生或科学家)发送他们的问题。顾问将问题转发给 ChatGPT 以获取答案,然后由顾问评估答案的准确性和适当性,再通过网站将答案发回给咨询者以供进一步审查。主要结果是咨询者和顾问组分别评估的 ChatGPT 回答的适当性、不适当性和不可靠率。与骨科临床医生相比,患者对 ChatGPT 回答关于骨质疏松性骨折预防问题的适当性的评分略高,尽管差异无统计学意义(88%比 80%,P=0.70)。对于医学科学问题,非医学专业人士和医学科学家的评分相似。此外,专家对 ChatGPT 回答骨质疏松性骨折预防和医学科学问题的适当性的评分相当。另一方面,患者认为 ChatGPT 回答骨质疏松性骨折预防问题的适当性略高于回答医学科学问题的适当性(88%比 72.7%,P=0.34)。ChatGPT 能够提供与医学科学问题以及与骨折预防相关问题相当的合适和适当的回答。寻求建议的咨询者和提供建议的顾问都认可 ChatGPT 在这些领域的专业知识。