Suppr超能文献

测试大规模通用人工智能模型在心理健康领域的专业知识和偏倚风险。

Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health.

作者信息

Heinz Michael V, Bhattacharya Sukanya, Trudeau Brianna, Quist Rachel, Song Seo Ho, Lee Camilla M, Jacobson Nicholas C

机构信息

Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.

Department of Psychiatry, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.

出版信息

Digit Health. 2023 Apr 17;9:20552076231170499. doi: 10.1177/20552076231170499. eCollection 2023 Jan-Dec.

Abstract

BACKGROUND

With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings.

METHODS

We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model's performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation.

FINDINGS

We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59).

INTERPRETATION

Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates.

摘要

背景

随着心理健康护理需求与可及性之间的差距迅速扩大,人工智能(AI)为心理健康评估和治疗提供了一个有前景、可扩展的解决方案。鉴于此类系统的新颖性和难以理解的性质,旨在了解此类系统的领域知识和潜在偏差的探索性措施对于正在进行的转化发展以及未来在高风险医疗环境中的部署而言是必要的。

方法

我们使用具有系统变化的人口统计学特征的人为临床病例 vignettes 来研究一个生成式 AI 模型的领域知识和人口统计学偏差。我们使用平衡准确率(BAC)来量化模型的性能。我们使用广义线性混合效应模型来量化人口统计学因素与模型解释之间的关系。

结果

我们发现不同诊断的模型性能存在差异;注意缺陷多动障碍、创伤后应激障碍、酒精使用障碍、自恋型人格障碍、暴饮暴食障碍和广泛性焦虑障碍的 BAC 较高(0.70≤BAC≤0.82);双相情感障碍、神经性贪食症、巴比妥类药物使用障碍、品行障碍、躯体症状障碍、苯二氮䓬类药物使用障碍、麦角酸二乙酰胺使用障碍、表演型人格障碍和功能性神经症状障碍的 BAC 较低(BAC≤0.59)。

解释

我们的研究结果表明大型 AI 模型在领域知识方面初步具有前景,性能差异可能是由于某些障碍的标志性症状更显著、鉴别诊断范围更窄以及患病率更高。我们发现模型人口统计学偏差的证据有限,尽管我们确实观察到模型结果中存在一些性别和种族差异,这反映了现实世界中的差异患病率估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba8c/10123874/c596b8b10fd6/10.1177_20552076231170499-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验