Fletcher Jack M, Stuebing Karla K, Barth Amy E, Miciak Jeremy, Francis David J, Denton Carolyn A
Texas Institute of Measurement, Evaluation, and Statistics and Department of Psychology, University of Houston.
Children's Learning Institute, University of Texas Health Science Center at Houston.
Top Lang Disord. 2014 Jan;34(1):74-89. doi: 10.1097/TLD.0000000000000004.
Agreement across methods for identifying students as inadequate responders or as learning disabled is often poor. We report (1) an empirical examination of final status (post-intervention benchmarks) and dual-discrepancy growth methods based on growth during the intervention and final status for assessing response to intervention; and (2) a statistical simulation of psychometric issues that may explain low agreement.
After a Tier 2 intervention, final status benchmark criteria were used to identify 104 inadequate and 85 adequate responders to intervention, with comparisons of agreement and coverage for these methods and a dual-discrepancy method. Factors affecting agreement were investigated using computer simulation to manipulate reliability, the intercorrelation between measures, cut points, normative samples, and sample size.
Identification of inadequate responders based on individual measures showed that single measures tended not to identify many members of the pool of 104 inadequate responders. Poor to fair levels of agreement for identifying inadequate responders were apparent between pairs of measures In the simulation, comparisons across two simulated measures generated indices of agreement (kappa) that were generally low because of multiple psychometric issues inherent in any test.
Expecting excellent agreement between two correlated tests with even small amounts of unreliability may not be realistic. Assessing outcomes based on multiple measures, such as level of CBM performance and short norm-referenced assessments of fluency may improve the reliability of diagnostic decisions.
通过不同方法来识别反应不足或有学习障碍的学生,其结果之间的一致性往往较差。我们报告了:(1)一项基于干预期间的成长情况和最终状态对干预反应进行评估的最终状态(干预后基准)和双差异成长方法的实证检验;(2)对可能导致一致性较低的心理测量学问题进行的统计模拟。
在进行二级干预后,使用最终状态基准标准来识别104名干预反应不足者和85名干预反应充分者,并比较这些方法与双差异方法的一致性和覆盖率。通过计算机模拟来操纵信度、测量指标之间的相互关联、切点、常模样本和样本量,以此研究影响一致性的因素。
基于个体测量指标来识别反应不足者的结果显示,单一测量指标往往无法识别出104名反应不足者中的许多成员。在测量指标对之间,识别反应不足者的一致性水平从差到一般。在模拟中,由于任何测试中都存在多个心理测量学问题,对两种模拟测量指标进行比较所产生的一致性指数(kappa)通常较低。
期望两个存在一定不可靠性的相关测试之间有高度一致性可能不太现实。基于多种测量指标来评估结果,如基于课程本位测量表现水平和流利度的简短常模参照评估,可能会提高诊断决策的可靠性。