Efendioglu Eyyup Murat, Cigiloglu Ahmet
Department of Internal Medicine, Division of Geriatric Medicine, Gaziantep City Hospital, Gaziantep, Turkey.
Department of Internal Medicine, Division of Geriatric Medicine, Kahramanmaraş Necip Fazıl City Hospital, 46050, Dulkadiroglu, Kahmaranmaraş, Turkey.
Eur Geriatr Med. 2025 Apr 21. doi: 10.1007/s41999-025-01202-2.
ChatGPT, a comprehensive language processing model, provides the opportunity for supportive and professional interactions with patients. However, its use to address patients' frequently asked questions (FAQs) and the readability of the text generated by ChatGPT remain unexplored, particularly in geriatrics. We identified the FAQs about common geriatric syndromes and assessed the accuracy and readability of the responses provided by ChatGPT.
Two geriatricians with extensive knowledge and experience in geriatric syndromes independently reviewed the 28 responses provided by ChatGPT. The accuracy of the responses generated by ChatGPT was categorized on a rating scale from 0 (harmful) to 4 (excellent) based on current guidelines and approaches. The readability of the text generated by ChatGPT was assessed by administering two tests: the Flesch-Kincaid Reading Ease (FKRE) and the Flesch-Kincaid Grade Level (FKGL).
ChatGPT-generated responses with an overall mean accuracy score of 88% (3.52/4). Responses generated for sarcopenia diagnosis and depression treatment in older adults had the lowest accuracy scores (2.0 and 2.5, respectively). The mean FKRE score of the texts was 25.2, while the mean FKGL score was 14.5.
The accuracy scores of the responses generated by ChatGPT were high in most common geriatric syndromes except for sarcopenia diagnosis and depression treatment. Moreover, the text generated by ChatGPT was very difficult to read and best understood by college graduates. ChatGPT may reduce the uncertainty many patients face. Nevertheless, it remains advisable to consult with subject matter experts when undertaking consequential decision-making.
ChatGPT是一个综合语言处理模型,为与患者进行支持性和专业性互动提供了机会。然而,其用于解答患者常见问题(FAQ)以及ChatGPT生成文本的可读性仍未得到探索,尤其是在老年医学领域。我们确定了关于常见老年综合征的常见问题,并评估了ChatGPT提供的回答的准确性和可读性。
两位在老年综合征方面具有丰富知识和经验的老年病专家独立审查了ChatGPT提供的28个回答。根据当前指南和方法,将ChatGPT生成回答的准确性按从0(有害)到4(优秀)的评分量表进行分类。通过进行两项测试来评估ChatGPT生成文本的可读性:弗莱施-金凯德易读性(FKRE)和弗莱施-金凯德年级水平(FKGL)。
ChatGPT生成的回答总体平均准确率为88%(3.52/4)。针对老年人肌肉减少症诊断和抑郁症治疗生成的回答准确率得分最低(分别为2.0和2.5)。文本的平均FKRE得分为25.2,而平均FKGL得分为14.5。
除了肌肉减少症诊断和抑郁症治疗外,ChatGPT生成的回答在大多数常见老年综合征中的准确率得分较高。此外,ChatGPT生成的文本非常难读,大学毕业生才能最好地理解。ChatGPT可能会减少许多患者面临的不确定性。然而,在进行重要决策时,咨询主题专家仍然是明智的。