Sharafi Zahra, Mousavi Amin, Ayatollahi Seyyed Mohammad Taghi, Jafari Peyman
Department of Biostatistics, Faculty of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.
Department of Educational Psychology and Special Education, College of Education, University of Saskatchewan, Saskatoon, SK, Canada.
Comput Math Methods Med. 2017;2017:7571901. doi: 10.1155/2017/7571901. Epub 2017 Sep 12.
The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items.
The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed.
Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates.
The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.
本研究旨在评估在存在多级数据和多类别计分项目的情况下,两种检测差异项目功能(DIF)方法的有效性。在大规模评估项目中,使用多级数据(例如,嵌套在医院中的患者、嵌套在地区中的医院)评估DIF受到了广泛关注,但很少有研究评估数据的层次结构对多类别计分项目DIF检测的影响。
采用有序逻辑回归(OLR)和分层有序逻辑回归(HOLR)来评估模拟和实际多级多类别数据中的DIF。在模拟研究中考虑了六个因素(DIF大小、分组变量、组内相关系数、聚类数、每个聚类中的参与者数以及项目区分参数),采用完全交叉设计。此外,还分析了从576名健康学童收集的儿童生活质量量表(PedsQL™)4.0的数据。
总体而言,结果表明两种方法在控制I型错误率和检测功效方面表现相当。
当前研究表明,在分层结构中检测多类别计分项目的DIF时,OLR和HOLR之间的差异可忽略不计。还讨论了分析实际数据时的影响和注意事项。