Moses Tim, Kim Sooyeon
College Board, Newtown, PA, USA.
Educational Testing Service, Princeton, NJ, USA.
Appl Psychol Meas. 2015 Jun;39(4):314-329. doi: 10.1177/0146621614563067. Epub 2014 Dec 22.
The purpose of this study was to propose extensions of reliability estimation methods that could be used to determine the conditions under which single scoring for constructed-response () items is as effective as double scoring in mixed-format licensure tests. Multivariate generalizability theory methods traditionally used to estimate overall composite score reliability were extended with simulations so that classification consistency and classification accuracy estimates could also be obtained. Composite score reliabilities, classification consistencies, and accuracies were estimated based on the double and single scoring of the items of three licensure tests. Composite score reliabilities, classification consistencies, and accuracies were also estimated in decision studies considering varied testing situations such as different numbers of items and different section weights.
本研究的目的是提出可靠性估计方法的扩展,这些方法可用于确定在何种条件下,建构反应(CR)项目的单次评分在混合格式执照考试中与双次评分一样有效。传统上用于估计总体综合分数可靠性的多变量概化理论方法通过模拟进行了扩展,以便也能获得分类一致性和分类准确性估计。基于三项执照考试中CR项目的双次评分和单次评分,估计了综合分数可靠性、分类一致性和准确性。在考虑不同测试情况(如不同数量的CR项目和不同的CR部分权重)的决策研究中,也估计了综合分数可靠性、分类一致性和准确性。