Haem Elham, Karlsson Mats O, Ueckert Sebastian
Department of Biostatistics, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.
Pharmacometrics Research Group, Department of Pharmacy, Uppsala University, Uppsala, Sweden.
J Pharmacokinet Pharmacodyn. 2024 Dec 10;52(1):4. doi: 10.1007/s10928-024-09949-0.
Composite scale data consists of numerous categorical questions/items that are often summed as a total score and are commonly utilized as primary endpoints in clinical trials. These endpoints are conceptually discrete and constrained by nature. Item response theory (IRT) is a powerful approach for detecting drug effects in composite scale data from clinical trials, but estimating all parameters requires a large sample size and all item information, which may not be available. Therefore, total score models are often utilized. The most popular total score models are continuous variable (CV) models, but this strategy establishes assumptions that go against the integer nature, and typically also the bounded nature, of data. Bounded integer (BI) and Coarsened grid (CG) models respect the nature of the data. However, their power to detect drug effects has not been as thoroughly studied in clinical trials. When an IRT model is accessible, IRT-informed models (I-BI and I-CV) are promising methods in which the mean and variability of the total score at any position are extracted from the existing IRT model. In this study, total score data were simulated from the MDS-UPDRS motor subscale. Then, the power, type 1 error, and treatment effect bias of six total score models for detecting drug effects in clinical trials were explored. Further, it was investigated how the power, type 1 of error, and treatment effect bias for the I-BI and I-CV models were affected by mis-specified item information from the IRT model. The I-BI model demonstrated the highest statistical power, maintained an acceptable Type I error rate, and exhibited minimal bias, approaching zero. Following that, the I-CV, BI, and CG with Czado transformation (CG_Czado) models provided the maximum power. However, the CG_Czado model had inflated type 1 error under low sample size scenarios in each arm of clinical trials. The CG model among total score models displayed the lowest power and the most inflated type 1 error. Therefore, the results favor the I-BI model when an IRT model is available; otherwise, the BI model.
综合量表数据由众多分类问题/条目组成,这些问题/条目通常被汇总为一个总分,并在临床试验中普遍用作主要终点。这些终点在概念上是离散的,并且受其性质的限制。项目反应理论(IRT)是一种用于检测来自临床试验的综合量表数据中药物效应的强大方法,但估计所有参数需要大样本量和所有项目信息,而这些信息可能无法获得。因此,总分模型经常被使用。最流行的总分模型是连续变量(CV)模型,但这种策略建立的假设与数据的整数性质以及通常的有界性质相悖。有界整数(BI)模型和粗化网格(CG)模型尊重数据的性质。然而,它们在临床试验中检测药物效应的能力尚未得到充分研究。当IRT模型可用时,基于IRT的模型(I-BI和I-CV)是很有前景的方法,其中总分在任何位置的均值和变异性是从现有的IRT模型中提取的。在本研究中,总分数据是从MDS-UPDRS运动子量表模拟而来的。然后,探讨了六种总分模型在临床试验中检测药物效应的效能(power)、一类错误和治疗效果偏差。此外,还研究了IRT模型中错误指定的项目信息如何影响I-BI和I-CV模型的效能、一类错误和治疗效果偏差。I-BI模型显示出最高的统计效能,保持了可接受的一类错误率,并且偏差最小,接近零。其次,I-CV、BI和采用Czado变换的CG(CG_Czado)模型提供了最大效能。然而,在临床试验各臂的低样本量情况下,CG_Czado模型的一类错误有所膨胀。总分模型中的CG模型显示出最低的效能和最膨胀的一类错误。因此,当IRT模型可用时,结果支持I-BI模型;否则,支持BI模型。