• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于自动转录心理治疗会话的大语言模型评分量表的开发与验证

Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions.

作者信息

Eberhardt Steffen T, Vehlen Antonia, Schaffrath Jana, Schwartz Brian, Baur Tobias, Schiller Dominik, Hallmen Tobias, André Elisabeth, Lutz Wolfgang

机构信息

Department of Psychology, Trier University, Trier, Germany.

Chair for Human-Centered Artificial Intelligence, Augsburg University, Wissenschaftspark 25+27, 54296, Trier, Germany.

出版信息

Sci Rep. 2025 Aug 12;15(1):29541. doi: 10.1038/s41598-025-14923-y.

DOI:10.1038/s41598-025-14923-y
PMID:40796797
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12343941/
Abstract

Rating scales have shaped psychological research, but are resource-intensive and can burden participants. Large Language Models (LLMs) offer a tool to assess latent constructs in text. This study introduces LLM rating scales, which use LLM responses instead of human ratings. We demonstrate this approach with an LLM rating scale measuring patient engagement in therapy transcripts. Automatically transcribed videos of 1,131 sessions from 155 patients were analyzed using DISCOVER, a software framework for local multimodal human behavior analysis. Llama 3.1 8B LLM rated 120 engagement items, averaging the top eight into a total score. Psychometric evaluation showed a normal distribution, strong reliability (ω = 0.953), and acceptable fit (CFI = 0.968, SRMR = 0.022), except RMSEA = 0.108. Validity was supported by significant correlations with engagement determinants (e.g., motivation, r = .413), processes (e.g., between-session efforts, r = .390), and outcomes (e.g., symptoms, r = - .304). Results remained robust across bootstrap resampling and cross-validation, accounting for nested data. The LLM rating scale exhibited strong psychometric properties, demonstrating the potential of the approach as an assessment tool. Importantly, this automated approach uses interpretable items, ensuring clear understanding of measured constructs, while supporting local implementation and protecting confidential data.

摘要

评分量表塑造了心理学研究,但资源密集且会给参与者带来负担。大语言模型(LLMs)提供了一种评估文本中潜在结构的工具。本研究引入了大语言模型评分量表,该量表使用大语言模型的回答而非人工评分。我们通过一个测量患者在治疗记录中参与度的大语言模型评分量表来展示这种方法。使用DISCOVER(一种用于局部多模态人类行为分析的软件框架)对来自155名患者的1131次治疗会话的自动转录视频进行了分析。Llama 3.1 8B大语言模型对120个参与度项目进行了评分,将前八项的平均分作为总分。心理测量学评估显示,除了近似误差均方根(RMSEA)为0.108外,呈正态分布、可靠性强(ω = 0.953)且拟合度可接受(比较拟合指数CFI = 0.968,标准化残差均方根SRMR = 0.022)。与参与度决定因素(如动机,r = 0.413)、过程(如治疗期间的努力,r = 0.390)和结果(如症状,r = -0.304)的显著相关性支持了效度。在考虑嵌套数据的情况下,经自助重采样和交叉验证,结果仍然稳健。大语言模型评分量表表现出强大的心理测量学特性,证明了该方法作为一种评估工具的潜力。重要的是,这种自动化方法使用可解释的项目,确保对所测量的结构有清晰的理解,同时支持本地实施并保护机密数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/023f/12343941/1fa26c66fe43/41598_2025_14923_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/023f/12343941/50d7e47bcf68/41598_2025_14923_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/023f/12343941/1fa26c66fe43/41598_2025_14923_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/023f/12343941/50d7e47bcf68/41598_2025_14923_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/023f/12343941/1fa26c66fe43/41598_2025_14923_Fig2_HTML.jpg

相似文献

1
Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions.用于自动转录心理治疗会话的大语言模型评分量表的开发与验证
Sci Rep. 2025 Aug 12;15(1):29541. doi: 10.1038/s41598-025-14923-y.
2
Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction.用于人格特质预测的大语言模型嵌入的心理测量评估
J Med Internet Res. 2025 Jul 8;27:e75347. doi: 10.2196/75347.
3
Ethical Awareness in the Use of Large Language Models: Development and Validation of a Scale for Healthcare Professionals.医疗专业人员使用大语言模型时的伦理意识:一种量表的开发与验证
J Eval Clin Pract. 2025 Aug;31(5):e70241. doi: 10.1111/jep.70241.
4
The development of a novel, standardized, norm-referenced Arabic Discourse Assessment Tool (ADAT), including an examination of psychometric properties of discourse measures in aphasia.开发一种新型、标准化、基于常模的阿拉伯语语篇评估工具(ADAT),包括评估失语症患者语篇测量的心理测量特性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):2103-2117. doi: 10.1111/1460-6984.13083. Epub 2024 Jun 18.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
7
Empowerment in primary care and psychiatric settings: a psychometric evaluation of the Swedish version of the empowerment scale.初级保健和精神科环境中的赋权:赋权量表瑞典语版本的心理测量评估
BMC Psychol. 2025 Aug 13;13(1):909. doi: 10.1186/s40359-025-03123-y.
8
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.在印度使用专门的大语言模型进行月经健康教育:MenstLLaMA的开发与评估研究
J Med Internet Res. 2025 Jul 16;27:e71977. doi: 10.2196/71977.
9
Development and validation of the provider documentation summarization quality instrument for large language models.大型语言模型的提供者文档摘要质量工具的开发与验证
J Am Med Inform Assoc. 2025 Jun 1;32(6):1050-1060. doi: 10.1093/jamia/ocaf068.
10
Is It Possible to Develop a Patient-reported Experience Measure With Lower Ceiling Effect?是否有可能开发一种天花板效应较低的患者报告体验测量方法?
Clin Orthop Relat Res. 2025 Apr 1;483(4):693-703. doi: 10.1097/CORR.0000000000003262. Epub 2024 Oct 25.

本文引用的文献

1
Attention heads of large language models.大型语言模型负责人请注意。
Patterns (N Y). 2025 Feb 6;6(2):101176. doi: 10.1016/j.patter.2025.101176. eCollection 2025 Feb 14.
2
Predicting working alliance in psychotherapy: A multi-modal machine learning approach.心理治疗中工作联盟的预测:一种多模态机器学习方法。
Psychother Res. 2025 Feb;35(2):256-270. doi: 10.1080/10503307.2024.2428702. Epub 2025 Jan 1.
3
Data-informed psychological therapy, measurement-based care, and precision mental health.数据驱动的心理治疗、基于测量的护理和精准心理健康。
J Consult Clin Psychol. 2024 Oct;92(10):671-673. doi: 10.1037/ccp0000904.
4
Machine-Learning-Based Prediction of Client Distress From Session Recordings.基于机器学习从会话记录预测客户痛苦程度
Clin Psychol Sci. 2024 May;12(3):435-446. doi: 10.1177/21677026231172694. Epub 2023 Jun 1.
5
Compliance and response consistency in a lengthy intensive longitudinal data protocol.在一个冗长的密集纵向数据协议中,一致性和反应一致性。
Psychol Assess. 2024 Oct;36(10):606-617. doi: 10.1037/pas0001332. Epub 2024 Aug 5.
6
Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling.使用 Bert 主题建模测量心理治疗记录中的联盟和症状严重程度。
Adm Policy Ment Health. 2024 Jul;51(4):509-524. doi: 10.1007/s10488-024-01356-4. Epub 2024 Mar 29.
7
Decoding emotions: Exploring the validity of sentiment analysis in psychotherapy.解码情绪:探索心理治疗中情感分析的有效性。
Psychother Res. 2025 Feb;35(2):174-189. doi: 10.1080/10503307.2024.2322522. Epub 2024 Feb 28.
8
Automating the assessment of multicultural orientation through machine learning and natural language processing.通过机器学习和自然语言处理实现多元文化取向评估的自动化。
Psychotherapy (Chic). 2024 Feb 1. doi: 10.1037/pst0000519.
9
Leveraging natural language processing to study emotional coherence in psychotherapy.利用自然语言处理研究心理治疗中的情绪连贯性。
Psychotherapy (Chic). 2024 Mar;61(1):82-92. doi: 10.1037/pst0000517. Epub 2024 Jan 18.
10
Implementing precision methods in personalizing psychological therapies: Barriers and possible ways forward.在个性化心理治疗中实施精准方法:障碍与可能的前进方向。
Behav Res Ther. 2024 Jan;172:104443. doi: 10.1016/j.brat.2023.104443. Epub 2023 Dec 1.