Veldhuis Marthe S, Ariëns Simone, Ypma Rolf J F, Abeel Thomas, Benschop Corina C G
Delft University of Technology, Mekelweg 5, 2628 CD Delft, The Netherlands; Netherlands Forensic Institute, Division of Digital and Biometric Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands.
Netherlands Forensic Institute, Division of Digital and Biometric Traces, Laan van Ypenburg 6, 2497GB The Hague, The Netherlands.
Forensic Sci Int Genet. 2022 Jan;56:102632. doi: 10.1016/j.fsigen.2021.102632. Epub 2021 Nov 21.
Machine learning obtains good accuracy in determining the number of contributors (NOC) in short tandem repeat (STR) mixture DNA profiles. However, the models used so far are not understandable to users as they only output a prediction without any reasoning for that conclusion. Therefore, we leverage techniques from the field of explainable artificial intelligence (XAI) to help users understand why specific predictions are made. Where previous attempts at explainability for NOC estimation have relied upon using simpler, more understandable models that achieve lower accuracy, we use techniques that can be applied to any machine learning model. Our explanations incorporate SHAP values and counterfactual examples for each prediction into a single visualization. Existing methods for generating counterfactuals focus on uncorrelated features. This makes them inappropriate for the highly correlated features derived from STR data for NOC estimation, as these techniques simulate combinations of features that could not have resulted from an STR profile. For this reason, we have constructed a new counterfactual method, Realistic Counterfactuals (ReCo), which generates realistic counterfactual explanations for correlated data. We show that ReCo outperforms state-of-the-art methods on traditional metrics, as well as on a novel realism score. A user evaluation of the visualization shows positive opinions of end-users, which is ultimately the most appropriate metric in assessing explanations for real-world settings.
机器学习在确定短串联重复序列(STR)混合DNA图谱中的贡献者数量(NOC)方面具有良好的准确性。然而,到目前为止所使用的模型对用户来说并不易懂,因为它们只输出一个预测结果,而没有对该结论进行任何推理。因此,我们利用可解释人工智能(XAI)领域的技术来帮助用户理解为何做出特定的预测。以往对NOC估计进行可解释性的尝试依赖于使用更简单、更易理解但准确性较低的模型,而我们使用的技术可以应用于任何机器学习模型。我们的解释将每个预测的SHAP值和反事实示例整合到一个单一的可视化中。现有的生成反事实的方法侧重于不相关的特征。这使得它们不适用于从STR数据中得出的用于NOC估计的高度相关特征,因为这些技术模拟的特征组合不可能来自STR图谱。出于这个原因,我们构建了一种新的反事实方法,即现实反事实(ReCo),它为相关数据生成现实的反事实解释。我们表明,ReCo在传统指标以及一个新的现实分数方面优于现有方法。对该可视化的用户评估显示了终端用户的积极评价,这最终是评估针对现实世界场景的解释时最合适的指标。