Suppr超能文献

测量似然比系统的校准:四种指标的比较,包括一种新指标 devPAV。

Measuring calibration of likelihood-ratio systems: A comparison of four metrics, including a new metric devPAV.

机构信息

The Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.

The Netherlands Forensic Institute, P.O. Box 24044, 2490 AA The Hague, The Netherlands.

出版信息

Forensic Sci Int. 2021 Apr;321:110722. doi: 10.1016/j.forsciint.2021.110722. Epub 2021 Feb 13.

Abstract

Numerical likelihood-ratio (LR) systems aim to calculate evidential strength for forensic evidence evaluation. Calibration of such LR-systems is essential: one does not want to over- or understate the strength of the evidence. Metrics that measure calibration differ in sensitivity to errors in calibration of such systems. In this paper we compare four calibration metrics by a simulation study based on Gaussian Log LR-distributions. Three calibration metrics are taken from the literature (Good, 1985; Royall, 1997; Ramos and Gonzalez-Rodriguez, 2013) [1-3], and a fourth metric is proposed by us. We evaluated these metrics by two performance criteria: differentiation (between well- and ill-calibrated LR-systems) and stability (of the value of the metric for a variety of well-calibrated LR-systems). Two metrics from the literature (the expected values of LR and of 1/LR, and the rate of misleading evidence stronger than 2) do not behave as desired in many simulated conditions. The third one (C) performs better, but our newly proposed method (which we coin devPAV) is shown to behave equally well to clearly better under almost all simulated conditions. On the basis of this work, we recommend to use both devPAV and C to measure calibration of LR-systems, where the current results indicate that devPAV is the preferred metric. In the future external validity of this comparison study can be extended by simulating non-Gaussian LR-distributions.

摘要

数值似然比 (LR) 系统旨在计算法医证据评估的证据强度。此类 LR 系统的校准至关重要:人们不希望过度或低估证据的强度。用于测量校准的指标在对系统校准误差的敏感性方面存在差异。在本文中,我们通过基于高斯对数 LR 分布的模拟研究比较了四种校准指标。三个校准指标取自文献(Good,1985;Royall,1997;Ramos 和 Gonzalez-Rodriguez,2013)[1-3],第四个指标是我们提出的。我们通过两个性能标准来评估这些指标:区分(在良好和不良校准的 LR 系统之间)和稳定性(对于各种良好校准的 LR 系统,指标的值)。文献中的两个指标(LR 和 1/LR 的期望值,以及误导性证据比 2 强的比率)在许多模拟条件下表现不佳。第三个指标 (C) 表现更好,但我们新提出的方法(我们称之为 devPAV)在几乎所有模拟条件下都表现出同样出色甚至更好的性能。基于这项工作,我们建议使用 devPAV 和 C 来测量 LR 系统的校准,目前的结果表明 devPAV 是首选指标。未来,可以通过模拟非高斯 LR 分布来扩展此比较研究的外部有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验