Parra Rodrigo, Ojeda Verena, Vázquez Noguera Jose Luis, García-Torres Miguel, Mello-Román Julio César, Villalba Cynthia, Facon Jacques, Divina Federico, Cardozo Olivia, Castillo Verónica Elisa, Matto Ingrid Castro
Centro de Investigación, Universidad Americana, Avenida Brasilia 1100, Asunción 1206, Paraguay.
Data Science and Big Data Lab., Universidad Pablo de Olavide, ES-41013 Seville, Spain.
Diagnostics (Basel). 2021 Oct 21;11(11):1951. doi: 10.3390/diagnostics11111951.
In the automatic diagnosis of ocular toxoplasmosis (OT), Deep Learning (DL) has arisen as a powerful and promising approach for diagnosis. However, despite the good performance of the models, decision rules should be interpretable to elicit trust from the medical community. Therefore, the development of an evaluation methodology to assess DL models based on interpretability methods is a challenging task that is necessary to extend the use of AI among clinicians. In this work, we propose a novel methodology to quantify the similarity between the decision rules used by a DL model and an ophthalmologist, based on the assumption that doctors are more likely to trust a prediction that was based on decision rules they can understand. Given an eye fundus image with OT, the proposed methodology compares the segmentation mask of OT lesions labeled by an ophthalmologist with the attribution matrix produced by interpretability methods. Furthermore, an open dataset that includes the eye fundus images and the segmentation masks is shared with the community. The proposal was tested on three different DL architectures. The results suggest that complex models tend to perform worse in terms of likelihood to be trusted while achieving better results in sensitivity and specificity.
在眼部弓形虫病(OT)的自动诊断中,深度学习(DL)已成为一种强大且有前景的诊断方法。然而,尽管模型表现良好,但决策规则应具有可解释性,以便赢得医学界的信任。因此,开发一种基于可解释性方法来评估DL模型的评估方法是一项具有挑战性的任务,对于在临床医生中推广人工智能的应用来说是必要的。在这项工作中,我们提出了一种新颖的方法,基于医生更可能信任基于他们能理解的决策规则所做出的预测这一假设,来量化DL模型使用的决策规则与眼科医生使用的决策规则之间的相似度。给定一张患有OT的眼底图像,所提出的方法将眼科医生标记的OT病变分割掩码与可解释性方法生成的归因矩阵进行比较。此外,一个包含眼底图像和分割掩码的开放数据集已与社区共享。该提议在三种不同的DL架构上进行了测试。结果表明,复杂模型在可信度方面往往表现较差,而在敏感性和特异性方面取得了更好的结果。