Suppr超能文献

从 DNA 混合物中获得的“低” LR 值:关于概率基因分型软件的校准和判别性能。

'Low' LRs obtained from DNA mixtures: On calibration and discrimination performance of probabilistic genotyping software.

机构信息

Netherlands Forensic Institute, Division of Biological Traces, the Netherlands.

Oslo University Hospital, Department of Forensic Sciences, Norway.

出版信息

Forensic Sci Int Genet. 2024 Nov;73:103099. doi: 10.1016/j.fsigen.2024.103099. Epub 2024 Jul 27.

Abstract

The validity of a probabilistic genotyping (PG) system is typically demonstrated by following international guidelines for the developmental and internal validation of PG software. These guidelines mainly focus on discriminatory power. Very few studies have reported with metrics that depend on calibration of likelihood ratio (LR) systems. In this study, discriminatory power as well as various calibration metrics, such as Empirical Cross-Entropy (ECE) plots, pool adjacent violator (PAV) plots, log likelihood ratio cost (Cllr and Cllr), fiducial calibration discrepancy plots, and Turing' expectation were examined using the publicly-available PROVEDIt dataset. The aim was to gain deeper insight into the performance of a variety of PG software in the 'lower' LR ranges (∼LR 1-10,000), with focus on DNAStatistX and EuroForMix which use maximum likelihood estimation (MLE). This may be a driving force for the end users to reconsider current LR thresholds for reporting. In previous studies, overstated 'low' LRs were observed for these PG software. However, applying (arbitrarily) high LR thresholds for reporting wastes relevant evidential value. This study demonstrates, based on calibration performance, that previously reported LR thresholds can be lowered or even discarded. Considering LRs >1, there was no evidence for miscalibration performance above LR ∼1000 when using Fst 0.01. Below this LR value, miscalibration was observed. Calibration performance generally improved with the use of Fst 0.03, but the extent of this was dependent on the dataset: results ranged from miscalibration up to LR ∼100 to no evidence of miscalibration alike PG software using different methods to model peak height, HMC and STRmix. This study demonstrates that practitioners using MLE-based models should be careful when low LR ranges are reported, though applying arbitrarily high LR thresholds is discouraged. This study also highlights various calibration metrics that are useful in understanding the performance of a PG system.

摘要

概率基因型 (PG) 系统的有效性通常通过遵循国际指南来证明,这些指南主要针对 PG 软件的开发和内部验证。这些指南主要关注判别能力。很少有研究报告依赖似然比 (LR) 系统校准的指标。在这项研究中,使用公开的 PROVEDIt 数据集,检查了判别能力以及各种校准指标,例如经验交叉熵 (ECE) 图、相邻违规者 (PAV) 图、对数似然比成本 (Cllr 和 Cllr)、基准校准差异图和图灵期望。目的是更深入地了解各种 PG 软件在“较低”LR 范围内(∼LR 1-10,000)的性能,重点关注使用最大似然估计 (MLE) 的 DNAStatistX 和 EuroForMix。这可能是最终用户重新考虑当前报告 LR 阈值的驱动力。在以前的研究中,这些 PG 软件观察到过高的“低”LR。然而,为了报告而应用(任意)高的 LR 阈值会浪费相关的证据价值。本研究基于校准性能表明,以前报告的 LR 阈值可以降低甚至丢弃。考虑到 LR >1,当使用 Fst 0.01 时,在 LR ∼1000 以上没有观察到校准性能错误。低于此 LR 值时,观察到校准错误。校准性能通常随着使用 Fst 0.03 而提高,但程度取决于数据集:结果从使用不同方法建模峰高的 PG 软件的校准错误到 LR ∼100 到没有校准错误的范围不等,HMC 和 STRmix。本研究表明,使用基于 MLE 的模型的从业者在报告低 LR 范围时应小心谨慎,尽管不鼓励应用任意高的 LR 阈值。本研究还强调了各种校准指标,这些指标有助于理解 PG 系统的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验