Department of Psychological Sciences.
Psychol Assess. 2020 May;32(5):473-492. doi: 10.1037/pas0000808. Epub 2020 Feb 6.
In the present study, the author employed tools and principles from the domain of machine learning to investigate four questions related to the generalizability of statistical prediction in psychological assessment. First, to what extent do predictive methods common to psychology research and machine learning actually tend to predict new data points in new settings? Second, of what practical value is parsimony in applied prediction? Third, what is the most effective way to select model predictors when attempting to maximize generalizability? Fourth, how well do the methods considered compare with one another with respect to prediction generalizability? To address these questions, the author developed various types of predictive models on the basis of Minnesota Multiphasic Personality Inventory (MMPI)-2-RF scales, using multiple prediction criteria, in a calibration inpatient sample, then externally validated those models by applying them to one or two clinical samples from other settings. Model generalizability was then evaluated based on prediction accuracy in the external validation samples. Noteworthy findings from the present study include (a) statistical models generally demonstrated observable performance shrinkage across settings regardless of modeling approach, though they nevertheless tended to retain non-negligible predictive power in new settings; (b) of the modeling approaches considered, regularized (penalized) regression methods appeared to produce the most consistently robust predictions across settings; (c) parsimony appeared more likely to reduce than to enhance model generalizability; and (d) multivariate models whose predictors were selected automatically tended to perform relatively well, often producing substantially more generalizable predictions than models whose predictors were selected based on theory. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
在本研究中,作者运用机器学习领域的工具和原理,调查了与心理评估中统计预测的泛化性相关的四个问题。首先,心理学研究和机器学习中常用的预测方法在多大程度上确实倾向于预测新环境中的新数据点?其次,简约在应用预测中的实际价值是什么?第三,在试图最大化泛化性时,选择模型预测因子的最有效方法是什么?第四,考虑到方法的泛化性,这些方法彼此之间的比较如何?为了解决这些问题,作者基于明尼苏达多相人格测验(MMPI-2-RF)量表开发了各种类型的预测模型,使用了多种预测标准,在一个校准住院样本中,然后通过将这些模型应用于来自其他环境的一个或两个临床样本,对这些模型进行外部验证。然后根据外部验证样本中的预测准确性来评估模型的泛化能力。本研究的重要发现包括:(a)无论建模方法如何,统计模型通常在不同环境下表现出可观察到的性能收缩,但它们在新环境中仍然倾向于保留不可忽视的预测能力;(b)在所考虑的建模方法中,正则化(惩罚)回归方法似乎在不同环境下产生最一致的稳健预测;(c)简约更有可能降低而不是提高模型的泛化性;(d)其预测因子自动选择的多元模型往往表现良好,通常比基于理论选择预测因子的模型产生更具泛化性的预测。(PsycInfo 数据库记录(c)2020 APA,保留所有权利)。