Biotype GmbH, Dresden, 01109, Germany; Technische Universität Dresden, Faculty of Computer Science, Dresden, 01187, Germany.
qualitype GmbH, Dresden, 01109, Germany.
Forensic Sci Int Genet. 2022 Sep;60:102744. doi: 10.1016/j.fsigen.2022.102744. Epub 2022 Jul 11.
Analysing mixed DNA profiles is a common task in forensic genetics. Due to the complexity of the data, such analysis is often performed using Markov Chain Monte Carlo (MCMC)-based genotyping algorithms. These trade off precision against execution time. When default settings (including default chain lengths) are used, as large as a 10-fold changes in inferred log-likelihood ratios (LR) are observed when the software is run twice on the same case. So far, this uncertainty has been attributed to the stochasticity of MCMC algorithms. Since LRs translate directly to strength of the evidence in a criminal trial, forensic laboratories desire LR with small run-to-run variability.
We present the use of a Hamiltonian Monte Carlo (HMC) algorithm that reduces run-to-run variability in forensic DNA mixture deconvolution by around an order of magnitude without increased runtime. We achieve this by enforcing strict convergence criteria. We show that the choice of convergence metric strongly influences precision. We validate our method by reproducing previously published results for benchmark DNA mixtures (MIX05, MIX13, and ProvedIt). We also present a complete software implementation of our algorithm that is able to leverage GPU acceleration for the inference process. In the benchmark mixtures, on consumer-grade hardware, the runtime is less than 7 min for 3 contributors, less than 35 min for 4 contributors, and less than an hour for 5 contributors with one known contributor.
分析混合 DNA 谱是法医学遗传学中的一项常见任务。由于数据的复杂性,这种分析通常使用基于马尔可夫链蒙特卡罗 (MCMC) 的基因分型算法来进行。这些算法在精度和执行时间之间进行权衡。当使用默认设置(包括默认链长)时,在同一个案例上两次运行软件时,推断的对数似然比 (LR) 会发生高达 10 倍的变化。到目前为止,这种不确定性归因于 MCMC 算法的随机性。由于 LR 直接转化为刑事审判中证据的强度,法医实验室希望 LR 的运行间变异性小。
我们提出了使用哈密顿蒙特卡罗 (HMC) 算法的方法,该算法可将法医 DNA 混合物反卷积的运行间变异性降低一个数量级,而运行时间不变。我们通过执行严格的收敛标准来实现这一点。我们表明,收敛度量的选择强烈影响精度。我们通过重现以前针对基准 DNA 混合物(MIX05、MIX13 和 ProvedIt)发表的结果来验证我们的方法。我们还展示了我们的算法的完整软件实现,该算法能够利用 GPU 加速进行推断过程。在基准混合物中,在消费级硬件上,对于 3 个贡献者,运行时间不到 7 分钟,对于 4 个贡献者,运行时间不到 35 分钟,对于 5 个贡献者,其中一个已知贡献者,运行时间不到 1 小时。