Golan Roei, Ripps Sarah J, Reddy Raghuram, Loloi Justin, Bernstein Ari P, Connelly Zachary M, Golan Noa S, Ramasamy Ranjith
Department of Clinical Sciences, Florida State University College of Medicine, Tallahassee, USA.
Herbert Wertheim College of Medicine, Florida International University, Miami, USA.
Cureus. 2023 Jul 20;15(7):e42214. doi: 10.7759/cureus.42214. eCollection 2023 Jul.
Introduction Artificial Intelligence (AI) platforms have gained widespread attention for their distinct ability to generate automated responses to various prompts. However, its role in assessing the quality and readability of a provided text remains unclear. Thus, the purpose of this study is to evaluate the proficiency of the conversational generative pre-trained transformer (ChatGPT) in utilizing the DISCERN tool to evaluate the quality of online content regarding shock wave therapy for erectile dysfunction. Methods Websites were generated using a Google search of "shock wave therapy for erectile dysfunction" with location filters disabled. Readability was analyzed using Readable software (Readable.com, Horsham, United Kingdom). Quality was assessed independently by three reviewers using the DISCERN tool. The same plain text files collected were inputted into ChatGPT to determine whether they produced comparable metrics for readability and quality. Results The study results revealed a notable disparity between ChatGPT's readability assessment and that obtained from a reliable tool, Readable.com (p<0.05). This indicates a lack of alignment between ChatGPT's algorithm and that of established tools, such as Readable.com. Similarly, the DISCERN score generated by ChatGPT differed significantly from the scores generated manually by human evaluators (p<0.05), suggesting that ChatGPT may not be capable of accurately identifying poor-quality information sources regarding shock wave therapy as a treatment for erectile dysfunction. Conclusion ChatGPT's evaluation of the quality and readability of online text regarding shockwave therapy for erectile dysfunction differs from that of human raters and trusted tools. Therefore, ChatGPT's current capabilities were not sufficient for reliably assessing the quality and readability of textual content. Further research is needed to elucidate the role of AI in the objective evaluation of online medical content in other fields. Continued development in AI and incorporation of tools such as DISCERN into AI software may enhance the way patients navigate the web in search of high-quality medical content in the future.
引言 人工智能(AI)平台因其能够对各种提示生成自动回复的独特能力而受到广泛关注。然而,其在评估所提供文本的质量和可读性方面的作用仍不明确。因此,本研究的目的是评估对话式生成预训练变换器(ChatGPT)在利用DISCERN工具评估关于勃起功能障碍冲击波治疗的在线内容质量方面的熟练程度。方法 通过谷歌搜索“勃起功能障碍冲击波治疗”生成网站,禁用位置过滤器。使用Readable软件(Readable.com,英国霍舍姆)分析可读性。由三名审阅者使用DISCERN工具独立评估质量。将收集到的相同纯文本文件输入ChatGPT,以确定它们是否能产生可比的可读性和质量指标。结果 研究结果显示,ChatGPT的可读性评估与从可靠工具Readable.com获得的评估结果存在显著差异(p<0.05)。这表明ChatGPT的算法与Readable.com等既定工具的算法不一致。同样,ChatGPT生成的DISCERN分数与人类评估者手动生成的分数有显著差异(p<0.05),这表明ChatGPT可能无法准确识别关于冲击波治疗作为勃起功能障碍治疗方法的低质量信息来源。结论 ChatGPT对关于勃起功能障碍冲击波治疗的在线文本的质量和可读性的评估与人类评分者和可靠工具的评估不同。因此,ChatGPT目前的能力不足以可靠地评估文本内容的质量和可读性。需要进一步研究以阐明人工智能在其他领域在线医学内容客观评估中的作用。人工智能的持续发展以及将DISCERN等工具纳入人工智能软件可能会在未来改善患者在网络上搜索高质量医学内容的方式。