• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大型语言模型评估思维障碍的维度:准确性和一致性的权衡。

Assessing dimensions of thought disorder with large language models: The tradeoff of accuracy and consistency.

机构信息

Department of Computer Science, University of Colorado Boulder, United States; Institute of Cognitive Science, University of Colorado Boulder, United States.

Institute of Cognitive Science, University of Colorado Boulder, United States.

出版信息

Psychiatry Res. 2024 Nov;341:116119. doi: 10.1016/j.psychres.2024.116119. Epub 2024 Aug 3.

DOI:10.1016/j.psychres.2024.116119
PMID:39226873
Abstract

Natural Language Processing (NLP) methods have shown promise for the assessment of formal thought disorder, a hallmark feature of schizophrenia in which disturbances to the structure, organization, or coherence of thought can manifest as disordered or incoherent speech. We investigated the suitability of modern Large Language Models (LLMs - e.g., GPT-3.5, GPT-4, and Llama 3) to predict expert-generated ratings for three dimensions of thought disorder (coherence, content, and tangentiality) assigned to speech samples collected from both patients with a diagnosis of schizophrenia (n = 26) and healthy control participants (n = 25). In addition to (1) evaluating the accuracy of LLM-generated ratings relative to human experts, we also (2) investigated the degree to which the LLMs produced consistent ratings across multiple trials, and we (3) sought to understand the factors that impacted the consistency of LLM-generated output. We found that machine-generated ratings of the level of thought disorder in speech matched favorably those of expert humans, and we identified a tradeoff between accuracy and consistency in LLM ratings. Unlike traditional NLP methods, LLMs were not always consistent in their predictions, but these inconsistencies could be mitigated with careful parameter selection and ensemble methods. We discuss implications for NLP-based assessment of thought disorder and provide recommendations of best practices for integrating these methods in the field of psychiatry.

摘要

自然语言处理(NLP)方法在评估形式思维障碍方面显示出了潜力,这是精神分裂症的一个标志性特征,其中思维的结构、组织或连贯性的紊乱可能表现为言语紊乱或不连贯。我们研究了现代大型语言模型(例如 GPT-3.5、GPT-4 和 Llama 3)在预测专家对言语样本中三个思维障碍维度(连贯性、内容和离题)的评分方面的适用性,这些言语样本来自被诊断为精神分裂症的患者(n=26)和健康对照组参与者(n=25)。除了(1)评估 LLM 生成的评分相对于人类专家的准确性外,我们还(2)研究了 LLMs 在多次试验中产生一致评分的程度,并且我们(3)试图了解影响 LLM 生成输出一致性的因素。我们发现,机器生成的言语思维障碍程度评分与专家人类的评分相当吻合,并且我们发现 LLM 评分的准确性和一致性之间存在权衡。与传统的 NLP 方法不同,LLMs 的预测并不总是一致,但通过仔细选择参数和集成方法可以减轻这些不一致性。我们讨论了基于 NLP 的思维障碍评估的影响,并为在精神病学领域整合这些方法提供了最佳实践建议。

相似文献

1
Assessing dimensions of thought disorder with large language models: The tradeoff of accuracy and consistency.使用大型语言模型评估思维障碍的维度:准确性和一致性的权衡。
Psychiatry Res. 2024 Nov;341:116119. doi: 10.1016/j.psychres.2024.116119. Epub 2024 Aug 3.
2
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
3
Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia.量化言语中的不连贯性:一种自动化方法及其在精神分裂症中的新应用。
Schizophr Res. 2007 Jul;93(1-3):304-16. doi: 10.1016/j.schres.2007.03.001. Epub 2007 Apr 16.
4
Speech disturbances in schizophrenia: Assessing cross-linguistic generalizability of NLP automated measures of coherence.精神分裂症中的言语障碍:评估自然语言处理连贯性自动测量方法的跨语言通用性。
Schizophr Res. 2023 Sep;259:59-70. doi: 10.1016/j.schres.2022.07.002. Epub 2022 Aug 1.
5
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
6
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析:人类验证研究。
JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.
7
Computational linguistic analysis applied to a semantic fluency task to measure derailment and tangentiality in schizophrenia.应用于语义流畅性任务的计算语言学分析,以测量精神分裂症中的离题和离题倾向。
Psychiatry Res. 2018 May;263:74-79. doi: 10.1016/j.psychres.2018.02.037. Epub 2018 Feb 17.
8
Theory-Driven Analysis of Natural Language Processing Measures of Thought Disorder Using Generative Language Modeling.基于生成式语言模型的思维障碍自然语言处理测量的理论驱动分析。
Biol Psychiatry Cogn Neurosci Neuroimaging. 2023 Oct;8(10):1013-1023. doi: 10.1016/j.bpsc.2023.05.005. Epub 2023 May 29.
9
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量:评估研究
ArXiv. 2024 Jan 23:arXiv:2402.01693v1.
10
Latent semantic variables are associated with formal thought disorder and adaptive behavior in older inpatients with schizophrenia.潜在语义变量与老年精神分裂症住院患者的形式思维障碍和适应性行为相关。
Cortex. 2014 Jun;55:88-96. doi: 10.1016/j.cortex.2013.02.006. Epub 2013 Feb 19.

引用本文的文献

1
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.生成式人工智能在心理健康领域的应用及伦理意义:系统综述
JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.
2
Artificial Intelligence in Psychiatry: A Review of Biological and Behavioral Data Analyses.精神病学中的人工智能:生物和行为数据分析综述
Diagnostics (Basel). 2025 Feb 11;15(4):434. doi: 10.3390/diagnostics15040434.
3
Bridging the gap: a practical step-by-step approach to warrant safe implementation of large language models in healthcare.
弥合差距:一种逐步实现医疗保健领域大语言模型安全实施的实用方法。
Front Artif Intell. 2025 Jan 27;8:1504805. doi: 10.3389/frai.2025.1504805. eCollection 2025.