Barakat Ahmed, Evans Jonathan, Gibbons Christopher, Singh Harvinder P
University Hospitals of Leicester NHS Trust, Leicester, UK.
University of Exeter, Exeter, UK.
Bone Joint Res. 2024 Aug 5;13(8):392-400. doi: 10.1302/2046-3758.138.BJR-2023-0412.R1.
The Oxford Shoulder Score (OSS) is a 12-item measure commonly used for the assessment of shoulder surgeries. This study explores whether computerized adaptive testing (CAT) provides a shortened, individually tailored questionnaire while maintaining test accuracy.
A total of 16,238 preoperative OSS were available in the National Joint Registry (NJR) for England, Wales, Northern Ireland, the Isle of Man, and the States of Guernsey dataset (April 2012 to April 2022). Prior to CAT, the foundational item response theory (IRT) assumptions of unidimensionality, monotonicity, and local independence were established. CAT compared sequential item selection with stopping criteria set at standard error (SE) < 0.32 and SE < 0.45 (equivalent to reliability coefficients of 0.90 and 0.80) to full-length patient-reported outcome measure (PROM) precision.
Confirmatory factor analysis (CFA) for unidimensionality exhibited satisfactory fit with root mean square standardized residual (RSMSR) of 0.06 (cut-off ≤ 0.08) but not with comparative fit index (CFI) of 0.85 or Tucker-Lewis index (TLI) of 0.82 (cut-off > 0.90). Monotonicity, measured by H value, yielded 0.482, signifying good monotonic trends. Local independence was generally met, with Yen's Q3 statistic > 0.2 for most items. The median item count for completing the CAT simulation with a SE of 0.32 was 3 (IQR 3 to 12), while for a SE of 0.45 it was 2 (IQR 2 to 6). This constituted only 25% and 16%, respectively, when compared to the 12-item full-length questionnaire.
Calibrating IRT for the OSS has resulted in the development of an efficient and shortened CAT while maintaining accuracy and reliability. Through the reduction of redundant items and implementation of a standardized measurement scale, our study highlights a promising approach to alleviate time burden and potentially enhance compliance with these widely used outcome measures.
牛津肩部评分(OSS)是一种包含12个条目的测评工具,常用于评估肩部手术。本研究探讨计算机自适应测试(CAT)在保持测试准确性的同时,是否能提供一个缩短的、个性化定制的问卷。
在英格兰、威尔士、北爱尔兰、马恩岛和根西岛数据集的国家关节注册中心(NJR)中,共有16238份术前OSS数据(2012年4月至2022年4月)。在进行CAT之前,先确立了项目反应理论(IRT)的基本假设,即单维度性、单调性和局部独立性。CAT将顺序项目选择与设定在标准误差(SE)<0.32和SE<0.45(分别相当于信度系数为0.90和0.80)的停止标准进行比较,以评估其与全长患者报告结局量表(PROM)精度的差异。
单维度性的验证性因子分析(CFA)显示,均方根标准化残差(RSMSR)为0.06(临界值≤0.08),拟合效果令人满意,但比较拟合指数(CFI)为0.85或塔克-刘易斯指数(TLI)为0.82(临界值>0.90)时,拟合效果不佳。用H值衡量的单调性得出0.482,表明单调性趋势良好。局部独立性总体上得到满足,大多数条目的Yen's Q3统计量>0.2。以SE为0.32完成CAT模拟的项目数中位数为3(四分位间距3至12),而以SE为0.45时为2(四分位间距2至6)。与12个条目的全长问卷相比,这分别仅占25%和16%。
对OSS进行IRT校准后,开发出了一种高效且缩短的CAT,同时保持了准确性和可靠性。通过减少冗余条目并实施标准化测量量表,我们的研究突出了一种有前景的方法,可减轻时间负担并可能提高对这些广泛使用的结局量表的依从性。