泌尿外科中医生与人工智能生成的信息对比：患者和医生对准确性、完整性及偏好的评估

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.

作者信息

Robinson Eric J, Qiu Chunyuan, Sands Stuart, Khan Mohammad, Vora Shivang, Oshima Kenichiro, Nguyen Khang, DiFronzo L Andrew, Rhew David, Feng Mark I

机构信息

Department of Urology, Los Angeles Medical Center, Kaiser Permanente, Los Angeles, CA, USA.

Department of Anesthesiology, Baldwin Park Medical Center, Kaiser Permanente, Baldwin Park, CA, USA.

出版信息

World J Urol. 2024 Dec 27;43(1):48. doi: 10.1007/s00345-024-05399-y.

DOI:10.1007/s00345-024-05399-y

PMID:39729119

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11680670/

Abstract

PURPOSE

To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.

METHODS

Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.

RESULTS

Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).

CONCLUSION

Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.

摘要

目的

评估人工智能（AI）和泌尿科医生针对患者关于良性前列腺增生（BPH）常见问题的信息回复在准确性、全面性、共情语气以及患者偏好方面的表现，涵盖护理的各个阶段。

方法

横断面研究，在模拟临床信息交流环境中，评估由2个AI聊天机器人和4名泌尿科医生针对20个与BPH相关问题的回复，无直接患者互动。专家使用李克特量表评估回复的准确性、完整性和共情语气，非医学评估人员对回复的来源偏好（聊天机器人与人类）及认知进行评分。

结果

5名非医学志愿者独立对120条回复（共600条）进行评估、排序并推断来源。对于志愿者评估，聊天机器人的平均（标准差）得分为3.0（1.4）（具有中等共情），显著高于泌尿科医生的2.1（1.1）（具有轻微共情）（p < 0.001）；聊天机器人的平均（标准差）得分及偏好排名为2.6（1.6），显著高于泌尿科医生的排名3.9（1.6）（p < 0.001）。两名主题专家（SMEs）分别独立评估120条回复（来自4名泌尿科医生和2个聊天机器人对20个问题的回答，共240条）。对于SME评估，聊天机器人的平均（标准差）准确性得分为4.5（1.1）（几乎全部正确），与泌尿科医生的4.6（1.2）无显著差异。聊天机器人的平均（标准差）完整性得分为2.4（0.8）（全面），显著高于泌尿科医生的1.6（0.6）（足够）（p < 0.001）。

结论

专家评估认为，聊天机器人生成的患者BPH信息回复与泌尿科医生的回复准确性相当且更完整。非医学志愿者更偏好聊天机器人生成的信息，并认为其比泌尿科医生生成的回复更具共情性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

泌尿外科中医生与人工智能生成的信息对比：患者和医生对准确性、完整性及偏好的评估

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

泌尿外科中医生与人工智能生成的信息对比：患者和医生对准确性、完整性及偏好的评估

Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论