基于项目反应理论的随机临床试验中的功效分析。

Power analysis in randomized clinical trials based on item response theory.

作者信息

Holman Rebecca, Glas Cees A W, de Haan Rob J

机构信息

Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, Amsterdam, The Netherlands.

出版信息

Control Clin Trials. 2003 Aug;24(4):390-410. doi: 10.1016/s0197-2456(03)00061-8.

DOI:10.1016/s0197-2456(03)00061-8

PMID:12865034

Abstract

Patient relevant outcomes, measured using questionnaires, are becoming increasingly popular endpoints in randomized clinical trials (RCTs). Recently, interest in the use of item response theory (IRT) to analyze the responses to such questionnaires has increased. In this paper, we used a simulation study to examine the small sample behavior of a test statistic designed to examine the difference in average latent trait level between two groups when the two-parameter logistic IRT model for binary data is used. The simulation study was extended to examine the relationship between the number of patients required in each arm of an RCT, the number of items used to assess them, and the power to detect minimal, moderate, and substantial treatment effects. The results show that the number of patients required in each arm of an RCT varies with the number of items used to assess the patients. However, as long as at least 20 items are used, the number of items barely affects the number of patients required in each arm of an RCT to detect effect sizes of 0.5 and 0.8 with a power of 80%. In addition, the number of items used has more effect on the number of patients required to detect an effect size of 0.2 with a power of 80%. For instance, if only five randomly selected items are used, it is necessary to include 950 patients in each arm, but if 50 items are used, only 450 are required in each arm. These results indicate that if an RCT is to be designed to detect small effects, it is inadvisable to use very short instruments analyzed using IRT. Finally, the SF-36, SF-12, and SF-8 instruments were considered in the same framework. Since these instruments consist of items scored in more than two categories, slightly different results were obtained.

摘要

使用问卷测量的患者相关结局，正日益成为随机临床试验（RCT）中流行的终点指标。最近，人们对使用项目反应理论（IRT）来分析此类问卷的回答的兴趣有所增加。在本文中，我们进行了一项模拟研究，以检验在使用二元数据的两参数逻辑IRT模型时，用于检验两组平均潜在特质水平差异的检验统计量的小样本行为。模拟研究得到扩展，以检验RCT每组所需患者数量、用于评估他们的项目数量以及检测最小、中度和显著治疗效果的效能之间的关系。结果表明，RCT每组所需患者数量随用于评估患者的项目数量而变化。然而，只要使用至少20个项目，项目数量对RCT每组检测效应大小为0.5和0.8且效能为80%时所需的患者数量几乎没有影响。此外，使用的项目数量对检测效应大小为0.2且效能为80%时所需的患者数量影响更大。例如，如果仅使用五个随机选择的项目，每组需要纳入950名患者，但如果使用50个项目，每组仅需要450名患者。这些结果表明，如果要设计一项RCT来检测小的效应，使用IRT分析的非常简短的工具是不可取的。最后，在相同框架下考虑了SF - 36、SF - 12和SF - 8工具。由于这些工具由得分超过两类的项目组成，因此获得了略有不同的结果。

相似文献

Power analysis in randomized clinical trials based on item response theory.

Control Clin Trials. 2003 Aug;24(4):390-410. doi: 10.1016/s0197-2456(03)00061-8.

Power and Sample Size Calculations in Clinical Trials with Patient-Reported Outcomes under Equal and Unequal Group Sizes Based on Graded Response Model: A Simulation Study.

Value Health. 2016 Jul-Aug;19(5):639-47. doi: 10.1016/j.jval.2016.03.1857. Epub 2016 Jul 29.

Analysis of longitudinal randomized clinical trials using item response models.

Contemp Clin Trials. 2009 Mar;30(2):158-70. doi: 10.1016/j.cct.2008.12.003. Epub 2008 Dec 24.

Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level.

BMC Med Res Methodol. 2018 Aug 28;18(1):87. doi: 10.1186/s12874-018-0542-6.

Estimating power for clinical trials with Patient Reported Outcomes - using Item Response Theory.

J Clin Epidemiol. 2022 Jan;141:141-148. doi: 10.1016/j.jclinepi.2021.10.002. Epub 2021 Oct 11.

Developing a health-related quality-of-life measure for end-stage renal disease: The CHOICE Health Experience Questionnaire.

Am J Kidney Dis. 2001 Jan;37(1):11-21. doi: 10.1053/ajkd.2001.20631.

Towards power and sample size calculations for the comparison of two groups of patients with item response theory models.

Stat Med. 2012 May 20;31(11-12):1277-90. doi: 10.1002/sim.4387. Epub 2011 Nov 8.

Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients--a simulation study.

BMC Med Res Methodol. 2010 Mar 25;10:24. doi: 10.1186/1471-2288-10-24.

Measurement model choice influenced randomized controlled trial results.

J Clin Epidemiol. 2016 Nov;79:140-149. doi: 10.1016/j.jclinepi.2016.06.011. Epub 2016 Jul 7.

Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): applications (with illustrations) to measures of physical functioning ability and general distress.

Qual Life Res. 2007;16 Suppl 1:43-68. doi: 10.1007/s11136-007-9186-4. Epub 2007 May 5.

引用本文的文献

Bayesian item response theory to estimate power in clinical trials with patient-reported outcomes as endpoints.

Qual Life Res. 2025 Apr;34(4):1113-1124. doi: 10.1007/s11136-024-03874-y. Epub 2025 Jan 8.

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT.

Psychometrika. 2023 Dec;88(4):1249-1298. doi: 10.1007/s11336-022-09883-5. Epub 2022 Aug 27.

Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia.

AAPS J. 2020 Aug 27;22(5):115. doi: 10.1208/s12248-020-00500-w.

Measuring physical and mental health during pregnancy and postpartum in an Australian childbearing population - validation of the PROMIS Global Short Form.

BMC Pregnancy Childbirth. 2019 Oct 22;19(1):370. doi: 10.1186/s12884-019-2546-6.

Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes.

Qual Life Res. 2018 Apr;27(4):1055-1063. doi: 10.1007/s11136-018-1821-8. Epub 2018 Feb 23.

Effects of Education on Differential Item Functioning on the 15-Item Modified Korean Version of the Boston Naming Test.

Psychiatry Investig. 2017 Mar;14(2):126-135. doi: 10.4306/pi.2017.14.2.126. Epub 2017 Mar 6.

The Impact of Non-attempted and Dually-Attempted Items on Person Abilities Using Item Response Theory.

Front Psychol. 2016 Oct 14;7:1572. doi: 10.3389/fpsyg.2016.01572. eCollection 2016.

The (mis)measurement of the Dark Triad Dirty Dozen: exploitation at the core of the scale.

PeerJ. 2016 Mar 1;4:e1748. doi: 10.7717/peerj.1748. eCollection 2016.

Power and sample size determination for the group comparison of patient-reported outcomes using the Rasch model: impact of a misspecification of the parameters.

BMC Med Res Methodol. 2015 Mar 15;15:21. doi: 10.1186/s12874-015-0011-4.

The association of dysmenorrhea with noncyclic pelvic pain accounting for psychological factors.

Am J Obstet Gynecol. 2013 Nov;209(5):422.e1-422.e10. doi: 10.1016/j.ajog.2013.08.020. Epub 2013 Aug 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于项目反应理论的随机临床试验中的功效分析。

Power analysis in randomized clinical trials based on item response theory.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献