Rainey Clare, Bond Raymond, McConnell Jonathan, Hughes Ciara, Kumar Devinder, McFadden Sonyia
Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland.
Ulster University, School of Computing, York St, Belfast, Northern Ireland.
PLOS Digit Health. 2024 Aug 7;3(8):e0000560. doi: 10.1371/journal.pdig.0000560. eCollection 2024 Aug.
Artificial Intelligence (AI) has been increasingly integrated into healthcare settings, including the radiology department to aid radiographic image interpretation, including reporting by radiographers. Trust has been cited as a barrier to effective clinical implementation of AI. Appropriating trust will be important in the future with AI to ensure the ethical use of these systems for the benefit of the patient, clinician and health services. Means of explainable AI, such as heatmaps have been proposed to increase AI transparency and trust by elucidating which parts of image the AI 'focussed on' when making its decision. The aim of this novel study was to quantify the impact of different forms of AI feedback on the expert clinicians' trust. Whilst this study was conducted in the UK, it has potential international application and impact for AI interface design, either globally or in countries with similar cultural and/or economic status to the UK. A convolutional neural network was built for this study; trained, validated and tested on a publicly available dataset of MUsculoskeletal RAdiographs (MURA), with binary diagnoses and Gradient Class Activation Maps (GradCAM) as outputs. Reporting radiographers (n = 12) were recruited to this study from all four regions of the UK. Qualtrics was used to present each participant with a total of 18 complete examinations from the MURA test dataset (each examination contained more than one radiographic image). Participants were presented with the images first, images with heatmaps next and finally an AI binary diagnosis in a sequential order. Perception of trust in the AI systems was obtained following the presentation of each heatmap and binary feedback. The participants were asked to indicate whether they would change their mind (or decision switch) in response to the AI feedback. Participants disagreed with the AI heatmaps for the abnormal examinations 45.8% of the time and agreed with binary feedback on 86.7% of examinations (26/30 presentations).'Only two participants indicated that they would decision switch in response to all AI feedback (GradCAM and binary) (0.7%, n = 2) across all datasets. 22.2% (n = 32) of participants agreed with the localisation of pathology on the heatmap. The level of agreement with the GradCAM and binary diagnosis was found to be correlated with trust (GradCAM:-.515;-.584, significant large negative correlation at 0.01 level (p = < .01 and-.309;-.369, significant medium negative correlation at .01 level (p = < .01) for GradCAM and binary diagnosis respectively). This study shows that the extent of agreement with both AI binary diagnosis and heatmap is correlated with trust in AI for the participants in this study, where greater agreement with the form of AI feedback is associated with greater trust in AI, in particular in the heatmap form of AI feedback. Forms of explainable AI should be developed with cognisance of the need for precision and accuracy in localisation to promote appropriate trust in clinical end users.
人工智能(AI)已越来越多地融入医疗环境,包括放射科,以辅助X光影像解读,包括放射技师撰写报告。信任被认为是人工智能在临床有效应用的障碍。未来,合理利用信任对于人工智能至关重要,以确保这些系统出于患者、临床医生和医疗服务的利益而得到合乎道德的使用。有人提出利用可解释人工智能的手段,如图像热图,通过阐明人工智能在做出决策时“关注”图像的哪些部分来提高人工智能的透明度和可信度。这项新研究的目的是量化不同形式的人工智能反馈对专家临床医生信任度的影响。虽然这项研究是在英国进行的,但它在全球或与英国文化和/或经济状况相似的国家,对人工智能界面设计具有潜在的国际应用价值和影响。本研究构建了一个卷积神经网络;在公开可用的肌肉骨骼X光片(MURA)数据集上进行训练、验证和测试,输出为二元诊断和梯度类激活映射(GradCAM)。从英国的四个地区招募了12名撰写报告的放射技师参与本研究。使用Qualtrics向每位参与者展示来自MURA测试数据集的总共18次完整检查(每次检查包含不止一张X光图像)。首先向参与者展示图像,接着展示带有热图的图像,最后按顺序展示人工智能的二元诊断结果。在每次展示热图和二元反馈后,获取参与者对人工智能系统的信任感知。参与者被要求指出他们是否会根据人工智能的反馈改变主意(或决策转变)。在异常检查中,参与者有45.8%的时间不同意人工智能热图,而在86.7%的检查(26/30次展示)中同意二元反馈。在所有数据集中,只有两名参与者表示他们会根据所有人工智能反馈(GradCAM和二元反馈)进行决策转变(0.7%,n = 2)。22.2%(n = 32)的参与者同意热图上病理定位。发现与GradCAM和二元诊断的一致程度与信任相关(GradCAM:-.515;-.584,在0.01水平上具有显著的大负相关(p = <.01),GradCAM和二元诊断分别为-.309;-.369,在0.01水平上具有显著的中等负相关(p = <.01))。这项研究表明,在本研究的参与者中,与人工智能二元诊断和热图的一致程度与对人工智能的信任相关,其中与人工智能反馈形式的更高一致性与对人工智能的更大信任相关,特别是在人工智能热图反馈形式中。应在认识到定位精度和准确性需求的情况下开发可解释人工智能形式,以促进临床最终用户的适当信任。