Feraco Tommaso, Toffalini Enrico
Department of General Psychology, University of Padova, Padua, Italy.
Front Psychol. 2025 Feb 4;15:1433339. doi: 10.3389/fpsyg.2024.1433339. eCollection 2024.
Recent developments suggest that Large Language Models (LLMs) provide a promising approach for approximating empirical correlation matrices of item responses by utilizing item embeddings and their cosine similarities. In this paper, we introduce a novel tool, which we label .
This tool integrates (a fine-tuned embedding model) with latent measurement models to assess model fit or misfit prior to data collection. To support our statement, we apply SEMbeddings to the 96 items of the VIA-IS-P, which measures 24 different character strengths, using responses from 31,697 participants.
Our analysis shows a significant, though not perfect, correlation ( = 0.67) between the cosine similarities of embeddings and empirical correlations among items. We then demonstrate how to fit confirmatory factor analyses on the cosine similarity matrices produced by and interpret the outcomes using modification indices. We found that relying on traditional fit indices when using SEMbeddings can be misleading as they often lead to more conservative conclusions compared to empirical results. Nevertheless, they provide valuable suggestions about possible misfit, and we argue that the modification indices obtained from these models could serve as a useful screening tool to make informed decisions about items prior to data collection.
As LLMs become increasingly precise and new fine-tuned models are released, these procedures have the potential to deliver more reliable results, potentially transforming the way new questionnaires are developed.
最近的进展表明,大语言模型(LLMs)通过利用项目嵌入及其余弦相似度,为近似项目反应的经验相关矩阵提供了一种有前景的方法。在本文中,我们介绍了一种新颖的工具,我们将其标记为 。
该工具将 (一个微调后的嵌入模型)与潜在测量模型相结合,以便在数据收集之前评估模型的拟合优度或失拟情况。为了支持我们的观点,我们将SEMbeddings应用于VIA-IS-P的96个项目,该量表测量24种不同的性格优势,使用了来自31697名参与者的回答。
我们的分析表明,嵌入的余弦相似度与项目之间的经验相关性之间存在显著但不完美的相关性( = 0.67)。然后,我们展示了如何对由 产生的余弦相似度矩阵进行验证性因素分析,并使用修正指数解释结果。我们发现,在使用SEMbeddings时依赖传统的拟合指数可能会产生误导,因为与实证结果相比,它们往往会得出更保守的结论。然而,它们提供了关于可能失拟的有价值的建议,并且我们认为从这些模型中获得的修正指数可以作为一种有用的筛选工具,以便在数据收集之前对项目做出明智的决策。
随着大语言模型变得越来越精确,并且新的微调模型被发布,这些程序有可能产生更可靠的结果,潜在地改变新问卷的开发方式。