Shaari Ariana L, Patil Disha P, Mohammed Saad, Salehi Parsa P
Department of Head and Neck Surgery, Rutgers New Jersey Medical School.
Rutgers School of Dental Medicine, Newark, NJ.
J Craniofac Surg. 2024 Nov 4. doi: 10.1097/SCS.0000000000010832.
To determine the readability and accuracy of information regarding mandible fractures generated by Chat Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4o.
Patients are increasingly turning to generative artificial intelligence to answer medical queries. To date, the accuracy and readability of responses regarding mandible fractures have not been assessed.
Twenty patient questions regarding mandible fractures were developed by querying AlsoAsked (https://alsoasked.com), SearchResponse (https://searchresponse.io), and Answer the Public (https://answerthepublic.com/). Questions were posed to ChatGPT 3.5 and 4o. Readability was assessed by calculating the Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, number of sentences, and percentage of complex words. Accuracy was assessed by a board-certified facial plastic and reconstructive otolaryngologist using a 5-point Likert Scale.
No significant differences were observed between the two versions for readability or accuracy. Readability was above recommended levels for patient education materials. Accuracy was low, and a majority of responses were deemed inappropriate for patient use with multiple inaccuracies and/or missing information.
ChatGPT produced responses written at a high level inappropriate for the average patient, in addition to containing several inaccurate statements. Patients and clinicians should be aware of the limitations of generative artificial intelligence when seeking medical information regarding mandible fractures.
确定由Chat Generative Pre-trained Transformer(ChatGPT)3.5版和4o版生成的有关下颌骨骨折信息的可读性和准确性。
患者越来越多地借助生成式人工智能来解答医学问题。迄今为止,尚未评估有关下颌骨骨折的回答的准确性和可读性。
通过查询AlsoAsked(https://alsoasked.com)、SearchResponse(https://searchresponse.io)和Answer the Public(https://answerthepublic.com/),提出了20个有关下颌骨骨折的患者问题。这些问题被提交给ChatGPT 3.5和4o。通过计算弗莱什-金凯德易读性、弗莱什-金凯德年级水平、句子数量和复杂单词百分比来评估可读性。准确性由一位获得委员会认证的面部整形和重建耳鼻喉科医生使用5点李克特量表进行评估。
在可读性或准确性方面,两个版本之间未观察到显著差异。可读性高于患者教育材料的推荐水平。准确性较低,大多数回答因存在多个不准确和/或缺失信息而被认为不适合患者使用。
ChatGPT生成的回答水平较高,普通患者难以理解,此外还包含一些不准确的陈述。在寻求有关下颌骨骨折的医学信息时,患者和临床医生应意识到生成式人工智能的局限性。