Wu Wei, Gu Fei, Fukui Sadaaki
Indiana University Purdue University Indianapolis, IN, Indianapolis, USA.
Virginia Tech, Blacksburg, VA, USA.
Behav Res Methods. 2022 Apr;54(2):922-940. doi: 10.3758/s13428-021-01671-w. Epub 2021 Aug 6.
Large-scale surveys are common in social and behavioral science research. Missing data often occur at item levels due to nonresponses or planned missing data designs. In practice, the item scores are typically aggregated into scale scores (i.e., sum or mean scores) for further analyses. Although several strategies to handle item-level missing data have been proposed, most of them are not easy to implement, especially for applied researchers. Using Monte Carlo simulations, we examined a practical hybrid approach to deal with item-level missing data in Likert scale items with a varying number of categories (i.e., four, five, and seven) and missing data mechanisms. Specifically, the examined approach first uses proration to calculate the scale scores for a participant if a certain proportion of item scores is available (a cutoff criterion of proration) and then use full information maximum likelihood to deal with missing data at the scale level when scale scores cannot be computed due to the selected proration cutoff criterion. Our simulation results showed that the hybrid approach was generally acceptable when the missing data were randomly spread over the items, even when they had different thresholds/means and loadings, with caution to be taken when the missingness is determined by one of the scale items. Based on the results, we recommend using the cutoff of 30% or 40% for proration when the sample size is small and the cutoff of 40% or 50% when the sample size is moderate or large.
大规模调查在社会和行为科学研究中很常见。由于无应答或计划好的缺失数据设计,缺失数据经常出现在项目层面。在实践中,项目得分通常会被汇总为量表得分(即总分或平均分)以便进一步分析。尽管已经提出了几种处理项目层面缺失数据的策略,但大多数策略都不容易实施,尤其是对于应用研究人员来说。通过蒙特卡洛模拟,我们研究了一种实用的混合方法,用于处理李克特量表项目中具有不同类别数量(即四个、五个和七个)和缺失数据机制的项目层面缺失数据。具体而言,所研究的方法首先在有一定比例的项目得分可用时(比例分摊的截止标准)使用比例分摊来计算参与者的量表得分,然后当由于选定的比例分摊截止标准无法计算量表得分时,使用全信息最大似然法来处理量表层面的缺失数据。我们的模拟结果表明,当缺失数据随机分布在各个项目上时,即使它们具有不同的阈值/均值和载荷,混合方法通常也是可以接受的,但当缺失性由其中一个量表项目决定时需谨慎。基于这些结果,我们建议当样本量较小时,比例分摊的截止值使用30%或40%;当样本量适中或较大时,截止值使用40%或50%。