Lineberger Comprehensive Cancer Center, 2331University of North Carolina at Chapel Hill, Carrboro, NC, USA.
Department of Population Health Sciences, 3065Duke University School of Medicine, Durham, NC, USA.
Stat Methods Med Res. 2021 Dec;30(12):2604-2618. doi: 10.1177/09622802211043263. Epub 2021 Oct 7.
The use of patient-reported outcomes measures is gaining popularity in clinical trials for comparing patient groups. Such comparisons typically focus on the differences in group means and are carried out using either a traditional sum-score-based approach or item response theory (IRT)-based approaches. Several simulation studies have evaluated different group mean comparison approaches in the past, but the performance of these approaches remained unknown under certain uninvestigated conditions (e.g. under the impact of differential item functioning (DIF)). By incorporating some of the uninvestigated simulation features, the current study examines Type I error, statistical power, and effect size estimation accuracy associated with group mean comparisons using simple sum scores, IRT model likelihood ratio tests, and IRT expected-a-posteriori scores. Manipulated features include sample size per group, number of items, number of response categories, strength of discrimination parameters, location of thresholds, impact of DIF, and presence of missing data. Results are summarized and visualized using decision trees.
在临床试验中,使用患者报告结局测量(PROMs)来比较患者群体的方法越来越受欢迎。此类比较通常侧重于组间均值的差异,并使用传统的总分法或项目反应理论(IRT)方法进行。过去有多项模拟研究评估了不同的组间均值比较方法,但在某些未被研究的情况下(例如在不同项目功能差异(DIF)的影响下),这些方法的性能仍不清楚。本研究通过纳入一些未被研究的模拟特征,考察了使用简单总和得分、IRT 模型似然比检验和 IRT 期望后验得分进行组间均值比较的Ⅰ类错误、统计功效和效应量估计准确性。操纵的特征包括每组的样本量、项目数量、反应类别数量、区分参数的强度、阈值位置、DIF 的影响以及缺失数据的存在。结果使用决策树进行总结和可视化。