Equipe de Statistique Appliquée, ESPCI Paris, INSERM, UMRS 1158 Neurophysiologie Respiratoire Expérimentale et Clinique, PSL Research University, Paris, France.
Institut de Recherche Criminelle de la Gendarmerie Nationale, Caserne Lange, France.
J Forensic Sci. 2021 Nov;66(6):2208-2217. doi: 10.1111/1556-4029.14818. Epub 2021 Aug 3.
The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches. Direct and indirect methods are compared in terms of sensitivity, specificity, and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.
基于各种类型的痕迹来区分同源和异源假设是法医学中的一个一般性问题。这个问题通常可以通过贝叶斯方法来解决,贝叶斯方法能够提供一个可能性比率,量化支持两个竞争假设的证据的相对强度。在这里,我们关注基于距离的方法,这些方法的稳健性,特别是它们处理高维证据的能力,存在很大的差异,需要进行评估和优化。本文提出了一种基于直接方法的统一框架,该方法基于估计两种竞争假设下每个痕迹之间距离的可能性,以及使用逻辑回归来区分同源和异源距离分布的间接方法。虽然直接方法更灵活,但间接方法在机器学习中更稳健,也更自然。此外,间接方法还可以使用向量距离,从而避免了标量距离方法所遭受的严重信息丢失。基于手气味图谱的实例,我们比较了直接和间接方法在灵敏度、特异性和稳健性方面的表现,包括有无降维和有无特征选择。对手气味图谱这一法医学领域中的新型和具有挑战性的证据进行了实证评估,该实例涉及 534 名受试者及其 1690 个气味痕迹。实验结果表明,间接方法具有显著的优势,特别是在没有降维和特征选择的情况下,具有更好的性能。