Malik Sheza, Kharel Himal, Dahiya Dushyant S, Ali Hassam, Blaney Hanna, Singh Achintya, Dhar Jahnvi, Perisetti Abhilash, Facciorusso Antonio, Chandan Saurabh, Mohan Babu P
Internal Medicine, Rochester General Hospital, NY, USA (Sheza Malik, Himal Kharel).
Gastroenterology, Hepatology, University of Kansas School of Medicine, Kansas, USA (Dushyant S. Dahiya).
Ann Gastroenterol. 2024 Sep-Oct;37(5):514-526. doi: 10.20524/aog.2024.0907. Epub 2024 Aug 19.
In view of the growing complexity of managing anticoagulation for patients undergoing gastrointestinal (GI) procedures, this study evaluated ChatGPT-4's ability to provide accurate medical guidance, comparing it with its prior artificial intelligence (AI) models (ChatGPT-3.5) and the retrieval-augmented generation (RAG)-supported model (ChatGPT4-RAG).
Thirty-six anticoagulation-related questions, based on professional guidelines, were answered by ChatGPT-4. Nine gastroenterologists assessed these responses for accuracy and relevance. ChatGPT-4's performance was also compared to that of ChatGPT-3.5 and ChatGPT4-RAG. Additionally, a survey was conducted to understand gastroenterologists' perceptions of ChatGPT-4.
ChatGPT-4's responses showed significantly better accuracy and coherence compared to ChatGPT-3.5, with 30.5% of responses fully accurate and 47.2% generally accurate. ChatGPT4-RAG demonstrated a higher ability to integrate current information, achieving 75% full accuracy. Notably, for diagnostic and therapeutic esophagogastroduodenoscopy, 51.8% of responses were fully accurate; for endoscopic retrograde cholangiopancreatography with and without stent placement, 42.8% were fully accurate; and for diagnostic and therapeutic colonoscopy, 50% were fully accurate.
ChatGPT4-RAG significantly advances anticoagulation management in endoscopic procedures, offering reliable and precise medical guidance. However, medicolegal considerations mean that a 75% full accuracy rate remains inadequate for independent clinical decision-making. AI may be more appropriately utilized to support and confirm clinicians' decisions, rather than replace them. Further evaluation is essential to maintain patient confidentiality and the integrity of the physician-patient relationship.
鉴于接受胃肠道(GI)手术患者的抗凝管理日益复杂,本研究评估了ChatGPT-4提供准确医学指导的能力,并将其与之前的人工智能(AI)模型(ChatGPT-3.5)和检索增强生成(RAG)支持的模型(ChatGPT4-RAG)进行比较。
ChatGPT-4回答了基于专业指南的36个抗凝相关问题。九位胃肠病学家评估了这些回答的准确性和相关性。ChatGPT-4的表现也与ChatGPT-3.5和ChatGPT4-RAG进行了比较。此外,还进行了一项调查以了解胃肠病学家对ChatGPT-4的看法。
与ChatGPT-3.5相比,ChatGPT-4的回答显示出明显更高的准确性和连贯性,30.5%的回答完全准确,47.2%的回答总体准确。ChatGPT4-RAG表现出更高的整合当前信息的能力,完全准确的比例达到75%。值得注意的是,对于诊断性和治疗性食管胃十二指肠镜检查,51.8%的回答完全准确;对于有或无支架置入的内镜逆行胰胆管造影,42.8%的回答完全准确;对于诊断性和治疗性结肠镜检查,50%的回答完全准确。
ChatGPT4-RAG在内镜手术的抗凝管理方面取得了显著进展,提供了可靠且精确的医学指导。然而,从法医学角度考虑,75%的完全准确率仍不足以支持独立的临床决策。人工智能可能更适合用于支持和确认临床医生的决策,而不是取代他们。进一步评估对于维护患者隐私和医患关系的完整性至关重要。