Chatziisaak Dimitrios, Burri Pascal, Sparn Moritz, Hahnloser Dieter, Steffen Thomas, Bischofberger Stephan
Department of Surgery, Kantonsspital St. Gallen, St. Gallen, Switzerland.
Department of Surgery, Centre Hôpitalier Universitaire Vaudois, Lausanne, Switzerland.
BJS Open. 2025 May 7;9(3). doi: 10.1093/bjsopen/zraf040.
The objective of this study was to evaluate the concordance between therapeutic recommendations proposed by a multidisciplinary team meeting and those generated by a large language model (ChatGPT) for colorectal cancer. Although multidisciplinary teams represent the 'standard' for decision-making in cancer treatment, they require significant resources and may be susceptible to human bias. Artificial intelligence, particularly large language models such as ChatGPT, has the potential to enhance or optimize the decision-making processes. The present study examines the potential for integrating artificial intelligence into clinical practice by comparing multidisciplinary team decisions with those generated by ChatGPT.
A retrospective, single-centre study was conducted involving consecutive patients with newly diagnosed colorectal cancer discussed at our multidisciplinary team meeting. The pre- and post-therapeutic multidisciplinary team meeting recommendations were assessed for concordance compared with ChatGPT-4.
One hundred consecutive patients with newly diagnosed colorectal cancer of all stages were included. In the pretherapeutic discussions, complete concordance was observed in 72.5%, with partial concordance in 10.2% and discordance in 17.3%. For post-therapeutic discussions, the concordance increased to 82.8%; 11.8% of decisions displayed partial concordance and 5.4% demonstrated discordance. Discordance was more frequent in patients older than 77 years and with an American Society of Anesthesiologists classification ≥ III.
There is substantial concordance between the recommendations generated by ChatGPT and those provided by traditional multidisciplinary team meetings, indicating the potential utility of artificial intelligence in supporting clinical decision-making for colorectal cancer management.
本研究的目的是评估多学科团队会议提出的治疗建议与大语言模型(ChatGPT)针对结直肠癌生成的治疗建议之间的一致性。尽管多学科团队代表了癌症治疗决策的“标准”,但它们需要大量资源,并且可能容易受到人为偏见的影响。人工智能,尤其是像ChatGPT这样的大语言模型,有潜力增强或优化决策过程。本研究通过比较多学科团队的决策与ChatGPT生成的决策,探讨将人工智能整合到临床实践中的潜力。
进行了一项回顾性单中心研究,纳入在我们多学科团队会议上讨论的连续的新诊断结直肠癌患者。将治疗前和治疗后的多学科团队会议建议与ChatGPT-4进行一致性评估。
纳入了100例连续的各期新诊断结直肠癌患者。在治疗前的讨论中,完全一致的情况占72.5%,部分一致的情况占10.2%,不一致的情况占17.3%。在治疗后的讨论中,一致性提高到82.8%;11.8%的决策显示部分一致,5.4%的决策显示不一致。在77岁以上且美国麻醉医师协会分级≥III级的患者中,不一致的情况更常见。
ChatGPT生成的建议与传统多学科团队会议提供的建议之间存在高度一致性,表明人工智能在支持结直肠癌管理的临床决策方面具有潜在效用。