Warren Eric M, Handley John C, Sheets H David
SEP Forensic Consultants, Memphis, Tennessee, USA.
Simon Business School, University of Rochester, Rochester, New York, USA.
J Forensic Sci. 2025 Mar;70(2):589-606. doi: 10.1111/1556-4029.15686. Epub 2024 Dec 10.
The inconclusive category in forensics reporting is the appropriate response in many cases, but it poses challenges in estimating an "error rate". We discuss the use of a class of information-theoretic measures related to cross entropy as an alternative set of metrics that allows for performance evaluation of results presented using multi-category reporting scales. This paper shows how this class of performance metrics, and in particular the log likelihood ratio cost, which is already in use with likelihood ratio forensic reporting methods and in machine learning communities, can be readily adapted for use with the widely used multiple category conclusions scales. Bayesian credible intervals on these metrics can be estimated using numerical methods. The application of these metrics to published test results is shown. It is demonstrated, using these test results, that reducing the number of categories used in a proficiency test from five or six to three increases the cross entropy, indicating that the higher number of categories was justified, as it they increased the level of agreement with ground truth.
在法医报告中,不确定类别在许多情况下是适当的回应,但它在估计“错误率”方面带来了挑战。我们讨论使用一类与交叉熵相关的信息论度量作为一组替代指标,以评估使用多类别报告量表呈现的结果的性能。本文展示了这类性能指标,特别是对数似然比成本,如何能够很容易地适用于广泛使用的多类别结论量表,对数似然比成本已在似然比法医报告方法和机器学习领域中使用。这些指标的贝叶斯可信区间可以使用数值方法进行估计。展示了这些指标在已发表测试结果中的应用。利用这些测试结果表明,将能力验证中使用的类别数量从五六个减少到三个会增加交叉熵,这表明较多的类别数量是合理的,因为它们提高了与真实情况的一致程度。