Buczak Philip
Department of Statistics, TU Dortmund University, Dortmund, Germany.
Research Center Trustworthy Data Science and Security, UA Ruhr, Dortmund, Germany.
Br J Math Stat Psychol. 2025 May;78(2):594-616. doi: 10.1111/bmsp.12375. Epub 2024 Dec 8.
Ordinal responses commonly occur in psychology, e.g., through school grades or rating scales. Where traditionally parametric statistical models like the proportional odds model have been used, machine learning (ML) methods such as random forest (RF) are increasingly employed for ordinal prediction. With new developments in assessment and new data sources yielding increasing quantities of data in the psychological sciences, such ML approaches promise high predictive performance. As RF does not inherently account for ordinality, several extensions have been proposed. A promising approach lies in assigning optimized numeric scores to the ordinal response categories and using regression RF. However, these optimization procedures are computationally expensive and have been shown to yield only situational benefit. In this work, I propose Frequency-Adjusted Borders Ordinal Forest (fabOF), a novel tree ensemble method for ordinal prediction forgoing extensive optimization while offering improved predictive performance in simulation and an illustrative example of student performance. To aid interpretation, I additionally introduce a permutation variable importance measure for fabOF tailored towards ordinal prediction. When applied to the illustrative example, an interest in higher education, mother's education, and study time are identified as important predictors of student performance. The presented methodology is made available through an accompanying R package.
有序响应在心理学中很常见,例如通过学校成绩或评分量表。传统上使用诸如比例优势模型之类的参数统计模型,而诸如随机森林(RF)之类的机器学习(ML)方法越来越多地用于有序预测。随着评估方面的新进展以及新数据源在心理科学领域产生越来越多的数据,此类ML方法有望实现高预测性能。由于RF本身并不考虑顺序性,因此已经提出了几种扩展方法。一种有前途的方法是为有序响应类别分配优化的数值分数,并使用回归随机森林。然而,这些优化过程计算成本高昂,并且已证明仅能带来有限的好处。在这项工作中,我提出了频率调整边界有序森林(fabOF),这是一种用于有序预测的新型树集成方法,它无需进行广泛的优化,同时在模拟和学生成绩的示例中提供了改进的预测性能。为了便于解释,我还为fabOF引入了一种针对有序预测量身定制的排列变量重要性度量。当应用于示例时,对高等教育的兴趣、母亲的教育程度和学习时间被确定为学生成绩的重要预测因素。所提出的方法通过随附的R包提供。