Vach W, Blettner M
Institute of Medical Biometry and Informatics, University of Freiburg, Germany.
Am J Epidemiol. 1991 Oct 15;134(8):895-907. doi: 10.1093/oxfordjournals.aje.a116164.
The effects of missing values for a confounding variable are investigated in the setting of case-control studies in which, for simplicity, the effect of one binary risk factor and one categoric confounding variable on disease risk is under investigation. Some ad hoc techniques with which to deal with missing values are examined under different assumptions about the missing-data mechanism. Examples are given to illustrate that the magnitude of the bias that is introduced by applying an inadequate procedure can be large under circumstances that occur frequently in empiric research. This is true even for so-called complete case analysis, i.e., when only data on subjects with complete information are used. Appropriate bias corrections are derived. Making use of data on those subjects who are neglected in complete case analysis by creating an additional category always results in biased estimation. An alternative is to allocate these subjects to the cells of the contingency table in an appropriate manner. This approach yields consistent estimates if the data are missing at random. Choosing an appropriate method for dealing with missing values always requires some knowledge of why the data are missing. This suggests that investigators should carry out validation studies to understand whether the missing values occur randomly across the study population or occur more frequently in specific subgroups.
在病例对照研究的背景下,研究了混杂变量缺失值的影响。为简单起见,在该研究中,调查了一个二元风险因素和一个分类混杂变量对疾病风险的影响。在关于缺失数据机制的不同假设下,研究了一些处理缺失值的临时技术。给出的例子表明,在实证研究中经常出现的情况下,应用不适当的程序所引入的偏差幅度可能很大。即使对于所谓的完全病例分析也是如此,即仅使用具有完整信息的受试者的数据时。推导了适当的偏差校正方法。通过创建一个额外类别来利用完全病例分析中被忽略的那些受试者的数据,总是会导致有偏差的估计。另一种方法是以适当的方式将这些受试者分配到列联表的单元格中。如果数据是随机缺失的,这种方法会产生一致的估计。选择一种合适的处理缺失值的方法总是需要一些关于数据缺失原因的知识。这表明研究人员应该进行验证研究,以了解缺失值是在整个研究人群中随机出现,还是在特定亚组中更频繁地出现。