Department of Linguistics, University of California, Davis, Kerr Hall, Davis, CA, 95616, USA.
Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, OR, 97403, USA.
Behav Res Methods. 2024 Sep;56(6):5557-5587. doi: 10.3758/s13428-023-02287-y. Epub 2023 Nov 28.
With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
随着混合效应回归模型成为每个心理语言学家的主流工具,人们越来越需要更全面地理解它们。在过去的十年中,心理语言学中关于混合效应模型的大多数工作都集中在正确指定随机效应结构上,以最小化评估固定效应预测因子的统计显著性时的误差。本研究检验了心理语言学中尚未讨论过的潜在随机效应的不恰当指定:违反单个体人口假设,在逻辑回归的背景下。实际研究中估计的随机效应分布通常似乎是双模态或多模态的。然而,目前还没有一种既定的方法来估计随机效应分布是否对应于多个潜在群体,特别是在更常见的情况下,随机效应的分布是多变量的。我们表明,违反单个体人口假设通常可以通过评估推断的随机效应结构的(多变量)正态性来检测到,除非数据显示准可分离性,即许多主体或项目表现出接近分类的行为。在没有准可分离性的情况下,几种聚类方法可以成功地确定每个参与者属于哪个组。两群和一群解决方案之间的 BIC 差异可用于确定主体(或项目)是否来自单一群体。然后,这允许研究人员定义和证明一个新的事后变量,指定参与者或项目所属的组,这些组可以被纳入回归分析。