Reeves B C, Quigley M
Department of Social Medicine, University of Bristol, UK.
Int J Epidemiol. 1997 Oct;26(5):1080-9. doi: 10.1093/ije/26.5.1080.
Verbal autopsy (VA) is an indirect method for estimating cause-specific mortality. In most previous studies, cause of death has been assigned from verbal autopsy data using expert algorithms or by physician review. Both of these methods may have poor validity. In addition, physician review is time consuming and has to be carried out by doctors. A range of methods exist for deriving classification rules from data. Such rules are quick and simple to apply and in many situations perform as well as experts.
This paper has two aims. First, it considers the advantages and disadvantages of the three main methods for deriving classification rules empirically; (a) linear and other discriminant techniques, (b) probability density estimation and (c) decision trees and rule-based methods. Second, it reviews the factors which need to be taken into account when choosing a classification method for assigning cause of death from VA data.
Four main factors influence the choice of classification method: (a) the purpose for which a classifier is being developed, (b) the number of validated causes of death assigned to each case, (c) the characteristics of the VA data and (d) the need for a classifier to be comprehensible. When the objective is to estimate mortality from a single cause of death, logistic regression should be used. When the objective is to determine patterns of mortality, the choice of method will depend on the above factors in ways which are elaborated in the paper.
Choice of classification method for assigning cause of death needs to be considered when designing a VA validation study. Comparison of the performance of classifiers derived using different methods requires a large VA dataset, which is not currently available.
死因推断(VA)是一种用于估计特定病因死亡率的间接方法。在以往的大多数研究中,死因是通过专家算法或医生审查从死因推断数据中确定的。这两种方法的有效性可能都较差。此外,医生审查耗时且必须由医生进行。存在一系列从数据中推导分类规则的方法。这些规则应用快速且简单,在许多情况下与专家的表现相当。
本文有两个目的。首先,它考虑了通过实证推导分类规则的三种主要方法的优缺点:(a)线性和其他判别技术,(b)概率密度估计,以及(c)决策树和基于规则的方法。其次,它回顾了从VA数据中选择用于确定死因的分类方法时需要考虑的因素。
四个主要因素影响分类方法的选择:(a)开发分类器的目的,(b)分配给每个病例的经过验证的死因数量,(c)VA数据的特征,以及(d)对分类器可理解性的需求。当目标是估计单一死因的死亡率时,应使用逻辑回归。当目标是确定死亡率模式时,方法的选择将取决于上述因素,本文对此进行了详细阐述。
在设计VA验证研究时,需要考虑选择用于确定死因的分类方法。比较使用不同方法得出的分类器的性能需要一个大型的VA数据集,而目前尚无此类数据集。