Curran D, Bacchi M, Schmitz S F, Molenberghs G, Sylvester R J
European Organization for Research and Treatment of Cancer (EORTC), Data Center, Brussels, Belgium.
Stat Med. 1998;17(5-7):739-56. doi: 10.1002/(sici)1097-0258(19980315/15)17:5/7<739::aid-sim818>3.0.co;2-m.
This paper discusses methods of identifying the types of missingness in quality of life (QOL) data in cancer clinical trials. The first approach involves collecting information on why the QOL questionnaires were not completed. Based on the reasons provided one may be able to distinguish the mechanisms causing missing data. The second approach is to model the missing data mechanism and perform hypothesis testing to determine the missing data processes. Two methods of testing if missing data are missing completely at random (MCAR) are presented and applied to incomplete longitudinal QOL data obtained from international multi-centre cancer clinical trials. The first method (Ridout, 1991) is based on a logistic regression and the second method (Park and Davis, 1993) is based on an adaptation of weighted least squares. In one application (advanced breast cancer) missing data was not likely to be MCAR. In the second application (adjuvant breast cancer) the missing mechanism was dependent on the QOL scale under study. MCAR and missing at random (MAR) have distinct consequences for data analysis. Therefore it is relevant to distinguish between them. However, if either MCAR or MAR hold, likelihood or Bayesian inferences can be based solely on the observed data, although for MAR, depending on the research question, modelling the dropout mechanism may still be necessary. Distinguishing between MAR and missing not at random (MNAR) is not trivial and relies on fundamentally untestable assumptions.
本文讨论了识别癌症临床试验中生活质量(QOL)数据缺失类型的方法。第一种方法是收集关于生活质量调查问卷未完成原因的信息。根据所提供的原因,有可能区分导致数据缺失的机制。第二种方法是对缺失数据机制进行建模,并进行假设检验以确定缺失数据过程。本文介绍了两种检验缺失数据是否完全随机缺失(MCAR)的方法,并将其应用于从国际多中心癌症临床试验中获得的不完整纵向生活质量数据。第一种方法(Ridout,1991年)基于逻辑回归,第二种方法(Park和Davis,1993年)基于加权最小二乘法的一种改编。在一个应用案例(晚期乳腺癌)中,缺失数据不太可能是完全随机缺失的。在第二个应用案例(辅助性乳腺癌)中,缺失机制取决于所研究的生活质量量表。完全随机缺失(MCAR)和随机缺失(MAR)在数据分析中有不同的后果。因此,区分它们是有意义的。然而,如果MCAR或MAR成立,似然推断或贝叶斯推断可以仅基于观察到的数据,尽管对于MAR,根据研究问题,对失访机制进行建模可能仍然是必要的。区分MAR和非随机缺失(MNAR)并非易事,并且依赖于根本无法检验的假设。