基于人工智能的大语言模型在疝病学决策支持中的有效性和安全性：专家和普通外科医生的评估

Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons.

作者信息

Nechay T V, Sazhin A V, Loban K M, Bogomolova A K, Suglob V V, Beniia T R

机构信息

Pirogov Russian National Research Medical University, Moscow, Russia.

出版信息

Khirurgiia (Mosk). 2024(8):6-14. doi: 10.17116/hirurgia20240816.

DOI:10.17116/hirurgia20240816

PMID:39140937

Abstract

OBJECTIVE

To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair.

MATERIAL AND METHODS

ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.

RESULTS

Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), <0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.

CONCLUSION

We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.

摘要

目的

评估ChatGPT提供的关于腹股沟疝修补术建议的质量。

材料与方法

向ChatGPT询问了5个关于腹股沟疝手术治疗的问题。将聊天机器人设定为疝科学专家的角色，并要求其仅在专业医学数据库中进行搜索，并提供参考文献和证据方面的信息。疝科学专家和外科医生（非专家）使用4分制（从0到3分）对ChatGPT生成的建议质量进行评分。探讨了参与者评分与其对人工智能的态度之间的统计相关性。

结果

专家对ChatGPT回复质量的评分低于非专家（2（1 - 2）对2（2 - 3），<0.001）。聊天机器人未能提供有效的参考文献和实际证据，并且伪造了一半的参考文献。受访者对神经网络用于临床决策支持的未来持乐观态度。他们中的大多数人反对在医疗保健中限制其使用。

结论

我们不建议将非专业的大语言模型作为临床决策或虚拟搜索助手的唯一或主要信息来源。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于人工智能的大语言模型在疝病学决策支持中的有效性和安全性：专家和普通外科医生的评估

Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIAL AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

基于人工智能的大语言模型在疝病学决策支持中的有效性和安全性：专家和普通外科医生的评估

Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIAL AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献