Department of Biochemistry, New South Wales Health Pathology, 6488Nepean Hospital, NSW, Australia.
Ann Clin Biochem. 2022 Nov;59(6):447-449. doi: 10.1177/00045632221128687. Epub 2022 Sep 22.
Explainability, the aspect of artificial intelligence-based decision support (ADS) systems that allows users to understand why predictions are made, offers many potential benefits. One common claim is that explainability increases user trust, yet this has not been established in healthcare contexts. For advanced algorithms such as artificial neural networks, the generation of explanations is not trivial, but requires the use of a second algorithm. The assumption of improved user trust should therefore be investigated to determine if it justifies the additional complexity.
Biochemistry staff completed a wrong blood in tube (WBIT) error identification task with the help of an ADS system. One-half of the volunteers were provided with both ADS predictions and explanations for those predictions, while the other half received predictions alone. The two groups were compared in terms of their rate of agreement with ADS predictions, as an index of user trust, and WBIT error detection performance. Since the AI model used to generate predictions was known to out-perform laboratory staff, increased trust was expected to improve user performance.
Volunteers reviewed 1590 sets of results. The volunteers provided with explanations demonstrated no difference in their rate of agreement with the ADS system compared to volunteers receiving predictions alone (83.3% versus 81.8%, = 0.46). The two volunteer groups were also equivalent in accuracy, sensitivity and specificity for WBIT error identification (-values >0.78).
For a WBIT error identification task, there was no evidence to justify the additional complexity of explainability on the grounds of increased user trust.
人工智能决策支持 (ADS) 系统的可解释性方面,允许用户了解为什么做出预测,提供了许多潜在的好处。一个常见的说法是可解释性可以提高用户信任度,但这在医疗保健环境中尚未得到证实。对于人工神经网络等高级算法,生成解释并不简单,而是需要使用第二个算法。因此,应该调查提高用户信任度的假设,以确定其是否证明了增加的复杂性是合理的。
生物化学人员在 ADS 系统的帮助下完成了一个错误血样在管 (WBIT) 错误识别任务。一半志愿者同时获得了 ADS 预测及其预测的解释,而另一半只获得了预测。这两组在与 ADS 预测的一致性率方面进行了比较,作为用户信任的指标,以及 WBIT 错误检测性能。由于用于生成预测的人工智能模型被认为表现优于实验室工作人员,因此预计增加信任度将提高用户的表现。
志愿者审查了 1590 组结果。与仅接受预测的志愿者相比,提供解释的志愿者对 ADS 系统的一致性没有差异(83.3% 对 81.8%, = 0.46)。两组志愿者在 WBIT 错误识别的准确性、敏感性和特异性方面也没有差异( - 值>0.78)。
对于 WBIT 错误识别任务,没有证据表明增加用户信任度的可解释性的额外复杂性是合理的。