Oranim Academic College, Tivon, Israel
Department of Psychology and Educational Counseling, Max Stern Academic College Of Emek Yezreel, Emek Yezreel, Israel.
Fam Med Community Health. 2023 Sep;11(4). doi: 10.1136/fmch-2023-002391.
To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer (ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians.
Vignettes were input to the ChatGPT interface. These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations. The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes (sex, socioeconomic status (blue collar worker or white collar worker) and depression severity (mild or severe)). Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4. Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses.
For mild depression, ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0% and 97.5% of cases, respectively. Primary care physicians, however, recommended psychotherapy in only 4.3% of cases. For severe cases, ChatGPT favoured an approach that combined psychotherapy, while primary care physicians recommended a combined approach. The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants (74% and 68%, respectively), in contrast with primary care physicians, who typically recommended a mix of antidepressants and anxiolytics/hypnotics (67.4%). Unlike primary care physicians, ChatGPT showed no gender or socioeconomic biases in its recommendations.
ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression, without showing the gender or socioeconomic biases observed among primary care physicians. Despite the suggested potential benefit of using atificial intelligence (AI) chatbots like ChatGPT to enhance clinical decision making, further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.
比较 ChatGPT-3 和 ChatGPT-4 生成的抑郁发作评估和建议治疗方案与初级保健医生建议的差异。
将病例输入 ChatGPT 界面。这些病例主要侧重于初始咨询时出现抑郁症状的假设患者。这些病例的创作者精心设计了八个不同的版本,系统地改变了患者的属性(性别、社会经济地位(蓝领或白领)和抑郁严重程度(轻度或重度))。每个变体随后被引入 ChatGPT-3.5 和 ChatGPT-4。每个病例重复 10 次,以确保 ChatGPT 反应的一致性和可靠性。
对于轻度抑郁,ChatGPT-3.5 和 ChatGPT-4 分别在 95.0%和 97.5%的病例中推荐心理治疗。然而,初级保健医生仅在 4.3%的病例中推荐心理治疗。对于重度病例,ChatGPT 倾向于采用心理治疗结合的方法,而初级保健医生则推荐采用联合方法。ChatGPT-3.5 和 ChatGPT-4 的药物治疗建议表现出对单独使用抗抑郁药的偏好(分别为 74%和 68%),与初级保健医生通常推荐抗抑郁药和抗焦虑药/催眠药混合使用(67.4%)形成对比。与初级保健医生不同,ChatGPT 在其建议中没有表现出性别或社会经济偏见。
ChatGPT-3.5 和 ChatGPT-4 与管理轻度和重度抑郁的公认指南一致,没有表现出初级保健医生观察到的性别或社会经济偏见。尽管使用人工智能(AI)聊天机器人(如 ChatGPT)来增强临床决策可能具有潜在的好处,但需要进一步研究来改进 AI 对重度病例的建议,并考虑潜在的风险和伦理问题。