Farzan Artificial Intelligence Team, Farzan Clinical Research Institute, Tehran, Islamic Republic of Iran.
Centre for Health Services Research, Faculty of Medicine, The University of Queensland, Brisbane, Australia; School of Psychological Sciences, Monash University, Melbourne, Australia.
Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.
Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5.
We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text.
The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models.
The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application.
AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.
医学决策对于有效治疗至关重要,特别是在精神病学领域,诊断往往依赖于主观的患者报告和缺乏高度特异性的症状。人工智能(AI),特别是像 GPT 这样的大型语言模型(LLM),已经成为提高精神病学诊断准确性的有前途的工具。这项比较研究探索了几种 AI 模型的诊断能力,包括 Aya、GPT-3.5、GPT-4、GPT-3.5 临床助理(CA)、Nemotron 和 Nemotron CA,使用来自 DSM-5 的临床病例。
我们从 DSM-5 临床病例书中整理了 20 个临床病例,涵盖了广泛的精神科诊断。我们使用提示来引出详细的诊断和推理,对四个先进的 AI 模型(GPT-3.5 Turbo、GPT-4、Aya、Nemotron)进行了测试。根据准确性和推理质量对模型的性能进行了评估,并使用检索增强生成(RAG)方法对访问 DSM-5 文本的模型进行了额外的分析。
AI 模型显示出不同的诊断准确性,GPT-3.5 和 GPT-4 在准确性和推理质量方面的表现明显优于 Aya 和 Nemotron。虽然模型在特定障碍(如环性心境和破坏性情绪失调障碍)方面存在困难,但其他障碍的诊断则表现出色,尤其是在诊断精神病和双相障碍方面。统计分析强调了准确性和推理方面的显著差异,突出了 GPT 模型的优越性。
人工智能在精神病学中的应用有望提高诊断准确性。GPT 模型的优越性能可归因于其先进的自然语言处理能力和对各种文本数据的广泛训练,使其能够更有效地解释精神科语言。然而,Aya 和 Nemotron 等模型在推理方面表现出局限性,表明需要进一步改进其训练和应用。
人工智能在增强精神病学诊断方面具有重要的潜力,某些模型在准确解释复杂临床描述方面具有很高的潜力。未来的研究应侧重于扩大数据集并整合多模态数据,以进一步提高人工智能在精神病学中的诊断能力。