Taşkaldıran Işılay, Emir Önder Çağatay, Gökbulut Püren, Koç Gönül, Kuşkonmaz Şerife Mehlika
Department of Endocrinology and Metabolism, Ankara Training and Research Hospital, Ankara, Turkey.
Digit Health. 2024 Aug 28;10:20552076241278692. doi: 10.1177/20552076241278692. eCollection 2024 Jan-Dec.
Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.
ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).
No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.
In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.
聊天生成预训练变换器(ChatGPT)目前被应用于医疗保健的各个领域,以获取与医疗相关问题有关的答案并评估现有信息。原发性甲状旁腺功能亢进是一种常见的内分泌疾病。我们旨在评估ChatGPT对多学科内分泌学会议上讨论的甲状旁腺功能亢进病例特定问题的回答的准确性和质量。
要求ChatGPT-4回答在多学科内分泌学会议上评估的10例甲状旁腺功能亢进病例。两名内分泌学家独立对回答的准确性、完整性和质量进行评分。准确性和完整性采用李克特量表进行评估,质量采用整体质量量表(GQS)进行评估。
回答中未发现误导性信息。在诊断方面,平均准确性得分(范围为1至5)为4.9±0.1,平均完整性得分(范围为1至3)为3.0。在关于进一步检查的回答中,平均准确性和完整性得分分别为4.8±0.13和2.6±0.16。治疗建议的平均准确性和完整性得分分别为4.9±0.1和2.4±0.16。GQS评估结果为80%高质量和20%中等质量。
在本研究中,ChatGPT-4在回答甲状旁腺功能亢进患者相关问题时的准确率和质量率总体较高。可以得出结论,人工智能可能成为医疗保健中有价值的工具。然而,也应评估ChatGPT的局限性和风险。