van der Zander Quirine E W, Roumans Rachel, Kusters Carolus H J, Dehghani Nikoo, Masclee Ad A M, de With Peter H N, van der Sommen Fons, Snijders Chris C P, Schoon Erik J
Department of Gastroenterology and Hepatology, Maastricht University Medical Center, Maastricht, The Netherlands; GROW, School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
Human-Technology Interaction, Eindhoven University of Technology, Eindhoven, The Netherlands.
Gastrointest Endosc. 2024 Dec;100(6):1070-1078.e10. doi: 10.1016/j.gie.2024.06.029. Epub 2024 Jun 26.
Computer-aided diagnosis (CADx) for the optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human-artificial intelligence interaction are lacking. Our aim was to investigate endoscopists' trust in CADx by evaluating whether communicating a calibrated algorithm confidence score improved trust.
Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterward, the same polyp was again shown with a CADx prediction: either only a prediction (benign or premalignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre)malignant prediction. In half of the polyps, CADx was mandatory, and for the other half, CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as the reference standard. Endoscopists' trust in CADx was measured as CADx prediction utilization: the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction.
Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists' diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, P < .001). The CADx prediction was used in 36.5% (n = 183 of 501) of disagreements. Adding a confidence score led to lower CADx prediction utilization, except when the confidence score surpassed 60. Mandatory CADx decreased CADx prediction utilization compared to optional CADx. Appropriate trust-using correct or disregarding incorrect CADx predictions-was 48.7% (n = 244 of 501).
Appropriate trust was common, and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human-artificial intelligence interaction.
对用于大肠息肉光学诊断的计算机辅助诊断(CADx)进行了深入研究。然而,关于人机交互的研究却很匮乏。我们的目的是通过评估传达校准后的算法置信度评分是否能提高信任度,来研究内镜医师对CADx的信任情况。
内镜医师对60个大肠息肉进行光学诊断。最初,内镜医师在没有CADx辅助的情况下诊断息肉(初始诊断)。随后,立即再次展示同一个息肉,并给出CADx预测结果:要么只有预测结果(良性或癌前病变),要么伴有校准后的置信度评分(0 - 100)。置信度评分为0表示良性预测,100表示(癌)前病变预测。在一半的息肉诊断中,CADx是强制使用的,而在另一半中,CADx是可选的。在内镜医师查看CADx预测结果后,做出最终诊断。组织病理学被用作参考标准。内镜医师对CADx的信任程度通过CADx预测结果的利用率来衡量:即当内镜医师最初不同意CADx预测结果时,遵循CADx预测结果的意愿。
23名内镜医师参与了研究。展示CADx预测结果提高了内镜医师的诊断准确性(初始诊断准确率为69.3%,最终诊断准确率为76.6%,P <.001)。在501例意见不一致的情况中,有36.5%(n = 183)使用了CADx预测结果。除了置信度评分超过60时,添加置信度评分会导致CADx预测结果的利用率降低。与可选使用CADx相比,强制使用CADx会降低CADx预测结果的利用率。正确使用或忽略错误的CADx预测结果的适当信任度为48.7%(n = 244/501)。
适当的信任很常见,对于没有置信度评分的可选CADx,CADx预测结果的利用率最高。这些结果表明了更好地理解人机交互的重要性。