文本对话分析用于轻度认知障碍的初步筛查：开发与验证研究。

Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study.

机构信息

Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China.

West China College of Stomatology, Sichuan University, Chengdu, China.

出版信息

J Med Internet Res. 2023 Dec 29;25:e51501. doi: 10.2196/51501.

DOI:10.2196/51501

PMID:38157230

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10787336/

Abstract

BACKGROUND

Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone.

OBJECTIVE

In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts.

METHODS

We gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators.

RESULTS

Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively.

CONCLUSIONS

ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis.

摘要

背景

针对认知障碍进行诊断的人工智能模型已经显示出了优异的效果。然而，仅凭文本，大型语言模型是否能与专业模型相媲美还不清楚。

目的

本研究旨在探索 ChatGPT 在轻度认知障碍（MCI）初步筛查中的性能，并对提示的设计步骤和组件进行标准化。

方法

我们从 DementiaBank 筛查中总共招募了 174 名参与者，并将其中 70%的参与者归入训练集，30%的参与者归入测试集。仅保留文本对话。使用宏代码清理句子，然后进行手动检查。提示包含 5 个主要部分，包括角色设定、评分系统设定、指标设定、输出设定和说明信息设定。纳入了来自已发表研究的三个维度的变量：词汇（即，单词频率和单词比、词组频率和词组比、词汇复杂性）、句法和语法（即，句法复杂性和语法成分）和语义（即，语义密度和语义连贯性）。我们使用 R 4.3.0 对变量和诊断指标进行分析。

结果

为模型纳入了 3 个与 MCI 严重程度相关的额外指标。这些指标对于区分 MCI 和认知正常参与者非常有效：舌尖现象（P<.001）、复杂想法困难（P<.001）和记忆问题（P<.001）。最终的 GPT-4 模型在训练集上的敏感性为 0.8636、特异性为 0.9487 和曲线下面积为 0.9062；在测试集上，敏感性、特异性和曲线下面积分别达到 0.7727、0.8333 和 0.8030。

结论

ChatGPT 可有效用于初步筛查可能患有 MCI 的参与者。临床医生对提示进行更标准化的改进也将提高模型的性能。需要注意的是，ChatGPT 不能替代临床医生进行诊断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/427b/10787336/c524ffc07980/jmir_v25i1e51501_fig1.jpg

相似文献

Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study.

J Med Internet Res. 2023 Dec 29;25:e51501. doi: 10.2196/51501.

Diagnostic test accuracy of telehealth assessment for dementia and mild cognitive impairment.

Cochrane Database Syst Rev. 2021 Jul 20;7(7):CD013786. doi: 10.1002/14651858.CD013786.pub2.

Diagnosis of Cognitive Impairment Compatible with Early Diagnosis of Alzheimer's Disease. A Bayesian Network Model based on the Analysis of Oral Definitions of Semantic Categories.

Methods Inf Med. 2016;55(1):42-9. doi: 10.3414/ME14-01-0071. Epub 2015 Apr 30.

Lexical factors and cerebral regions influencing verbal fluency performance in MCI.

Neuropsychologia. 2014 Feb;54:98-111. doi: 10.1016/j.neuropsychologia.2013.12.010. Epub 2013 Dec 30.

Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.

J Med Internet Res. 2023 May 31;25:e46924. doi: 10.2196/46924.

Syntactic Complexity as a Linguistic Marker to Differentiate Mild Cognitive Impairment From Normal Aging.

J Speech Lang Hear Res. 2020 May 22;63(5):1416-1429. doi: 10.1044/2020_JSLHR-19-00335. Epub 2020 May 13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The Accuracy of Speech and Linguistic Analysis in Early Diagnostics of Neurocognitive Disorders in a Memory Clinic Setting.

Arch Clin Neuropsychol. 2023 Jul 25;38(5):667-676. doi: 10.1093/arclin/acac105.

引用本文的文献

Prompt Engineering in Clinical Practice: Tutorial for Clinicians.

J Med Internet Res. 2025 Sep 15;27:e72644. doi: 10.2196/72644.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

Application of large language models in disease diagnosis and treatment.

Chin Med J (Engl). 2025 Jan 20;138(2):130-142. doi: 10.1097/CM9.0000000000003456. Epub 2024 Dec 26.

Analyzing evaluation methods for large language models in the medical field: a scoping review.

BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.

Prompt Engineering Paradigms for Medical Applications: Scoping Review.

J Med Internet Res. 2024 Sep 10;26:e60501. doi: 10.2196/60501.

Evaluating cognitive performance: Traditional methods vs. ChatGPT.

Digit Health. 2024 Aug 16;10:20552076241264639. doi: 10.1177/20552076241264639. eCollection 2024 Jan-Dec.

Responsible development of clinical speech AI: Bridging the gap between clinical research and technology.

NPJ Digit Med. 2024 Aug 9;7(1):208. doi: 10.1038/s41746-024-01199-1.

Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study.

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2023-2030. doi: 10.1007/s00405-024-08498-z. Epub 2024 Feb 12.

Shaping the Future of Older Adult Care: ChatGPT, Advanced AI, and the Transformation of Clinical Practice.

JMIR Aging. 2023 Sep 13;6:e51776. doi: 10.2196/51776.

本文引用的文献

Health system-scale language models are all-purpose prediction engines.

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

Prompt Engineering with ChatGPT: A Guide for Academic Writers.

Ann Biomed Eng. 2023 Dec;51(12):2629-2633. doi: 10.1007/s10439-023-03272-4. Epub 2023 Jun 7.

How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora's Box?

JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835.

Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI).

Clin Chem Lab Med. 2023 Apr 24;61(7):1158-1166. doi: 10.1515/cclm-2023-0355. Print 2023 Jun 27.

Expanding the use of brief cognitive assessments to detect suspected early-stage cognitive impairment in primary care.

Alzheimers Dement. 2023 Sep;19(9):4252-4259. doi: 10.1002/alz.13051. Epub 2023 Apr 19.

The current state of artificial intelligence-augmented digitized neurocognitive screening test.

Front Hum Neurosci. 2023 Mar 30;17:1133632. doi: 10.3389/fnhum.2023.1133632. eCollection 2023.

The Artificial intelligence large language models and neuropsychiatry practice and research ethic.

Asian J Psychiatr. 2023 Jun;84:103577. doi: 10.1016/j.ajp.2023.103577. Epub 2023 Mar 31.

DementiaBank: Theoretical Rationale, Protocol, and Illustrative Analyses.

Am J Speech Lang Pathol. 2023 Mar 9;32(2):426-438. doi: 10.1044/2022_AJSLP-22-00281. Epub 2023 Feb 15.

Deep learning-based speech analysis for Alzheimer's disease detection: a literature review.

Alzheimers Res Ther. 2022 Dec 14;14(1):186. doi: 10.1186/s13195-022-01131-3.

Semantic coherence markers: The contribution of perplexity metrics.

Artif Intell Med. 2022 Dec;134:102393. doi: 10.1016/j.artmed.2022.102393. Epub 2022 Sep 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

文本对话分析用于轻度认知障碍的初步筛查：开发与验证研究。

Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study.

机构信息

Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China.

West China College of Stomatology, Sichuan University, Chengdu, China.