Geantă Marius, Bădescu Daniel, Chirca Narcis, Nechita Ovidiu Cătălin, Radu Cosmin George, Rascu Ștefan, Rădăvoi Daniel, Sima Cristian, Toma Cristian, Jinga Viorel
Department of Urology, "Carol Davila" University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania.
Center for Innovation in Medicine, 42J Theodor Pallady Blvd., 032266 Bucharest, Romania.
Bioengineering (Basel). 2024 Jun 27;11(7):654. doi: 10.3390/bioengineering11070654.
This study assesses the effectiveness of chatbots powered by Large Language Models (LLMs)-ChatGPT 3.5, CoPilot, and Gemini-in delivering prostate cancer information, compared to the official Patient's Guide. Using 25 expert-validated questions, we conducted a comparative analysis to evaluate accuracy, timeliness, completeness, and understandability through a Likert scale. Statistical analyses were used to quantify the performance of each model. Results indicate that ChatGPT 3.5 consistently outperformed the other models, establishing itself as a robust and reliable source of information. CoPilot also performed effectively, albeit slightly less so than ChatGPT 3.5. Despite the strengths of the Patient's Guide, the advanced capabilities of LLMs like ChatGPT significantly enhance educational tools in healthcare. The findings underscore the need for ongoing innovation and improvement in AI applications within health sectors, especially considering the ethical implications underscored by the forthcoming EU AI Act. Future research should focus on investigating potential biases in AI-generated responses and their impact on patient outcomes.
本研究评估了由大语言模型(LLMs)——ChatGPT 3.5、CoPilot和Gemini驱动的聊天机器人在提供前列腺癌信息方面的有效性,并与官方患者指南进行了比较。我们使用25个经过专家验证的问题,通过李克特量表进行了比较分析,以评估准确性、及时性、完整性和可理解性。使用统计分析来量化每个模型的性能。结果表明,ChatGPT 3.5始终优于其他模型,成为一个强大且可靠的信息来源。CoPilot也表现有效,尽管略逊于ChatGPT 3.5。尽管患者指南有其优势,但像ChatGPT这样的大语言模型的先进功能显著增强了医疗保健领域的教育工具。研究结果强调了卫生部门人工智能应用持续创新和改进的必要性,特别是考虑到即将出台的欧盟人工智能法案所强调的伦理影响。未来的研究应侧重于调查人工智能生成回复中的潜在偏差及其对患者结果的影响。