临床试验中处理缺失数据的实用方法概述。

An overview of practical approaches for handling missing data in clinical trials.

作者信息

DeSouza Cynthia M, Legedza Anna T R, Sankoh Abdul J

机构信息

Department of Biometrics, Vertex Pharmaceuticals, Inc., Cambridge, Massachusetts, USA.

出版信息

J Biopharm Stat. 2009 Nov;19(6):1055-73. doi: 10.1080/10543400903242795.

DOI:10.1080/10543400903242795

PMID:20183464

Abstract

For a variety of reasons including poorly designed case report forms (CRFs), incomplete or invalid CRF data entries, and premature treatment or study discontinuations, missing data is a common phenomenon in controlled clinical trials. With the widely accepted use of the intent-to-treat (ITT) analysis dataset as the primary analysis dataset for the analysis of controlled clinical trial data, the presence of missing data could lead to complicated data analysis strategies and subsequently to controversy in the interpretation of trial results. In this article, we review the mechanisms of missing data and some common approaches to analyzing missing data with an emphasis on study dropouts. We discuss the importance of understanding the reasons for study dropouts with ways to assess the mechanisms of missingness. Finally, we discuss the results of a comparative Monte Carlo investigation of the performance characteristics of commonly utilized statistical methods for the analysis of clinical trial data with dropouts. The methods investigated include the mixed effects model for repeated measurements (MMRM), weighted and unweighted generalized estimating equations (GEE) method for the available case data, multiple-imputation-based GEE (MI-GEE), complete case (CC) analysis of covariance (ANCOVA), and last observation carried forward (LOCF) ANCOVA. Simulation experiments for the repeated measures model with missing at random (MAR) dropout, under varying dropout rates and intrasubject correlation, show that the LOCF, ANCOVA, and weighted GEE methods perform poorly in terms of percent relative bias for estimating a difference in means effect, while the MI-GEE and weighted GEE methods both have less power for rejecting a zero difference in means hypothesis.

摘要

由于多种原因，包括病例报告表（CRF）设计不佳、CRF数据录入不完整或无效以及过早治疗或研究中断，缺失数据在对照临床试验中是一种常见现象。随着意向性分析（ITT）数据集作为分析对照临床试验数据的主要分析数据集被广泛接受，缺失数据的存在可能导致复杂的数据分析策略，并进而引发试验结果解释方面的争议。在本文中，我们回顾了缺失数据的机制以及一些分析缺失数据的常见方法，重点是研究失访。我们讨论了理解研究失访原因的重要性以及评估缺失机制的方法。最后，我们讨论了一项比较蒙特卡洛研究的结果，该研究针对含失访情况的临床试验数据常用统计方法的性能特征进行了研究。所研究的方法包括重复测量混合效应模型（MMRM）、针对可用病例数据的加权和非加权广义估计方程（GEE）方法、基于多重填补的GEE（MI-GEE）、协方差分析的完整病例（CC）分析以及末次观察结转（LOCF）协方差分析。针对随机缺失（MAR）失访的重复测量模型，在不同失访率和个体内相关性条件下进行的模拟实验表明，在估计均值效应差异的相对偏差百分比方面，LOCF、协方差分析和加权GEE方法表现不佳，而MI-GEE和加权GEE方法在拒绝均值差异为零的假设方面的功效都较低。