Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave,, Suite 600, Seattle, WA 98121, USA.
Popul Health Metr. 2011 Aug 4;9:29. doi: 10.1186/1478-7954-9-29.
Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs).
The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE).
For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates.
We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for adult and child VA with and without HCE and for neonatal VA without HCE. It is also preferable to PCVA in terms of time and cost. Therefore, we recommend it as the technique of choice for analyzing past and current verbal autopsies.
计算机编码死因推断(CCVA)是一种很有前途的替代方法,可以替代医师认证死因推断(PCVA),因为它速度快、成本低且可靠。本研究引入了一种新的 CCVA 技术,并使用定义明确的临床诊断标准作为 12542 例死因推断(VA)多站点样本的金标准来验证其性能。
我们从机器学习(ML)中采用随机森林(RF)方法,通过训练随机森林来区分每对死因来预测死因,并通过一种新的排名技术来组合结果。我们使用机会校正一致率(chance-corrected concordance)在个体水平上评估新方法的质量,并使用死因特异性死亡率分数(cause-specific mortality fraction,CSMF)准确性和线性回归在人群水平上评估质量。我们还将 RF 与所有这些指标的 PCVA 进行了比较。我们分别对成人、儿童和新生儿 VA 进行了分析。我们还评估了在有和没有家庭回忆医疗保健经历(health care experience,HCE)的情况下性能的变化。
对于所有指标,在所有设置中,RF 与 PCVA 一样好或更好,除了有 HCE 信息的新生儿的 CSMF 准确性略低但无统计学差异。有 HCE 时,RF 的机会校正一致率在成人中高 3.4 个百分点,在儿童中高 3.2 个百分点,在新生儿中高 1.6 个百分点。成人的 CSMF 准确性高 0.097,儿童高 0.097,新生儿低 0.007。无 HCE 时,RF 的机会校正一致率在成人中比 PCVA 高 8.1 个百分点,在儿童中高 10.2 个百分点,在新生儿中高 5.9 个百分点。成人的 CSMF 准确性高 0.102,儿童高 0.131,新生儿高 0.025。
我们发现,对于有和没有 HCE 的成人和儿童 VA 以及没有 HCE 的新生儿 VA,我们的 RF 方法在机会校正一致率和 CSMF 准确性方面优于 PCVA 方法。与 PCVA 相比,它在时间和成本方面也更具优势。因此,我们建议将其作为分析过去和当前死因推断的首选技术。