Multivariate Behav Res. 2001 Oct 1;36(4):523-62. doi: 10.1207/S15327906MBR3604_03.
The present study compared the fit of several IRT models to two personality assessment instruments. Data from 13,059 individuals responding to the US-English version of the Fifth Edition of the Sixteen Personality Factor Questionnaire (16PF) and 1,770 individuals responding to Goldberg's 50 item Big Five Personality measure were analyzed. Various issues pertaining to the fit of the IRT models to personality data were considered. We examined two of the most popular parametric models designed for dichotomously scored items (i.e., the two- and three-parameter logistic models) and a parametric model for polytomous items (Samejima's graded response model). Also examined were Levine's nonparametric maximum likelihood formula scoring models for dichotomous and polytomous data, which were previously found to provide good fits to several cognitive ability tests (Drasgow, Levine, Tsien, Williams, & Mead, 1995). The two- and three-parameter logistic models fit some scales reasonably well but not others; the graded response model generally did not fit well. The nonparametric formula scoring models provided the best fit of the models considered. Several implications of these findings for personality measurement and personnel selection were described.
本研究比较了几种IRT 模型在两个人格评估工具上的拟合情况。数据来自于 13059 名美国英语版第五版十六项人格因素问卷(16PF)的回答者和 1770 名回答 Goldberg 50 项大五人格量表的个体。分析了与人格数据拟合相关的各种问题。我们考察了两种最受欢迎的、针对二分记分项目设计的参数模型(即二参数逻辑模型和三参数逻辑模型),以及一种针对多分项目的参数模型(Samejima 的等级反应模型)。还考察了 Levine 的针对二分和多分数据的非参数最大似然公式评分模型,这些模型先前被发现对多项认知能力测试有很好的拟合(Drasgow、Levine、Tsien、Williams 和 Mead,1995)。二参数和三参数逻辑模型对一些量表的拟合情况较好,但对其他量表的拟合情况则较差;等级反应模型通常拟合效果不佳。非参数公式评分模型提供了所考虑模型中最好的拟合。描述了这些发现对人格测量和人员选拔的几个影响。