Holman Rebecca, Glas Cees A W, de Haan Rob J
Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, Amsterdam, The Netherlands.
Control Clin Trials. 2003 Aug;24(4):390-410. doi: 10.1016/s0197-2456(03)00061-8.
Patient relevant outcomes, measured using questionnaires, are becoming increasingly popular endpoints in randomized clinical trials (RCTs). Recently, interest in the use of item response theory (IRT) to analyze the responses to such questionnaires has increased. In this paper, we used a simulation study to examine the small sample behavior of a test statistic designed to examine the difference in average latent trait level between two groups when the two-parameter logistic IRT model for binary data is used. The simulation study was extended to examine the relationship between the number of patients required in each arm of an RCT, the number of items used to assess them, and the power to detect minimal, moderate, and substantial treatment effects. The results show that the number of patients required in each arm of an RCT varies with the number of items used to assess the patients. However, as long as at least 20 items are used, the number of items barely affects the number of patients required in each arm of an RCT to detect effect sizes of 0.5 and 0.8 with a power of 80%. In addition, the number of items used has more effect on the number of patients required to detect an effect size of 0.2 with a power of 80%. For instance, if only five randomly selected items are used, it is necessary to include 950 patients in each arm, but if 50 items are used, only 450 are required in each arm. These results indicate that if an RCT is to be designed to detect small effects, it is inadvisable to use very short instruments analyzed using IRT. Finally, the SF-36, SF-12, and SF-8 instruments were considered in the same framework. Since these instruments consist of items scored in more than two categories, slightly different results were obtained.
使用问卷测量的患者相关结局,正日益成为随机临床试验(RCT)中流行的终点指标。最近,人们对使用项目反应理论(IRT)来分析此类问卷的回答的兴趣有所增加。在本文中,我们进行了一项模拟研究,以检验在使用二元数据的两参数逻辑IRT模型时,用于检验两组平均潜在特质水平差异的检验统计量的小样本行为。模拟研究得到扩展,以检验RCT每组所需患者数量、用于评估他们的项目数量以及检测最小、中度和显著治疗效果的效能之间的关系。结果表明,RCT每组所需患者数量随用于评估患者的项目数量而变化。然而,只要使用至少20个项目,项目数量对RCT每组检测效应大小为0.5和0.8且效能为80%时所需的患者数量几乎没有影响。此外,使用的项目数量对检测效应大小为0.2且效能为80%时所需的患者数量影响更大。例如,如果仅使用五个随机选择的项目,每组需要纳入950名患者,但如果使用50个项目,每组仅需要450名患者。这些结果表明,如果要设计一项RCT来检测小的效应,使用IRT分析的非常简短的工具是不可取的。最后,在相同框架下考虑了SF - 36、SF - 12和SF - 8工具。由于这些工具由得分超过两类的项目组成,因此获得了略有不同的结果。