You Xiaofeng, Yang Jianqin, Xu Xinai
School of Mathematics and Information Science, Nanchang Normal University, Nanchang, China.
Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China.
Front Psychol. 2025 Aug 5;16:1487111. doi: 10.3389/fpsyg.2025.1487111. eCollection 2025.
The handling of missing data in cognitive diagnostic assessment is an important issue. The Random Forest Threshold Imputation (RFTI) method proposed by You et al. in 2023 is specifically designed for cognitive diagnostic models (CDMs) and built on the random forest imputation. However, in RFTI, the threshold for determining imputed values to be 0 is fixed at 0.5, which may result in uncertainty in this imputation. To address this issue, we proposed an improved method, Random Forest Dynamic Threshold Imputation (RFDTI), which possess two dynamic thresholds for dichotomous imputed values. A simulation study showed that the classification of attribute profiles when using RFDTI to impute missing data was always better than the four commonly used traditional methods (i.e., person mean imputation, two-way imputation, expectation-maximization algorithm, and multiple imputation). Compared with RFTI, RFDTI was slightly better for MAR or MCAR data, but slightly worse for MNAR or MIXED data, especially with a larger missingness proportion. An empirical example with MNAR data demonstrates the applicability of RFDTI, which performed similarly as RFTI and much better than the other four traditional methods. An R package is provided to facilitate the application of the proposed method.
认知诊断评估中缺失数据的处理是一个重要问题。You等人在2023年提出的随机森林阈值插补(RFTI)方法专门为认知诊断模型(CDM)设计,并建立在随机森林插补的基础上。然而,在RFTI中,将插补值确定为0的阈值固定为0.5,这可能导致这种插补存在不确定性。为了解决这个问题,我们提出了一种改进方法,即随机森林动态阈值插补(RFDTI),它为二分插补值拥有两个动态阈值。一项模拟研究表明,使用RFDTI插补缺失数据时属性轮廓的分类总是优于四种常用的传统方法(即个人均值插补、双向插补、期望最大化算法和多重插补)。与RFTI相比,RFDTI对MAR或MCAR数据略好,但对MNAR或混合数据略差,尤其是在缺失比例较大时。一个具有MNAR数据的实证例子证明了RFDTI的适用性,其表现与RFTI相似,且比其他四种传统方法好得多。提供了一个R包以促进所提出方法的应用。