• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 BERT 的和生成式大型语言模型在自杀意念检测中的比较分析:一项性能评估研究。

Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study.

机构信息

Instituto Federal de Educação, Ciência e Tecnologia do Ceará, Fortaleza, Brasil.

Universidade Federal do Delta do Parnaíba, Parnaíba, Brasil.

出版信息

Cad Saude Publica. 2024 Nov 25;40(10):e00028824. doi: 10.1590/0102-311XEN028824. eCollection 2024.

DOI:10.1590/0102-311XEN028824
PMID:39607132
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11654116/
Abstract

Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot's response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.

摘要

人工智能可以检测文本中的自杀意念表现。研究表明,基于 BERT 的模型在文本分类问题中表现更好。大型语言模型(LLM)可以在没有专门训练的情况下回答自由文本查询。这项工作旨在比较三种 BERT 模型变体和 LLM(谷歌 Bard、微软 Bing/GPT-4 和 OpenAI ChatGPT-3.5)在识别巴西葡萄牙语非临床文本中自杀意念的性能。一个由心理学家标记的数据集由 2691 个没有自杀意念的句子和 1097 个有自杀意念的句子组成,其中 100 个句子被选来测试。我们应用了数据预处理技术、超参数优化和留一交叉验证来训练和测试 BERT 模型。在评估 LLM 时,我们使用了零样本提示工程。根据聊天机器人的回复,对每个测试句子进行了是否包含自杀意念的标记。Bing/GPT-4 在所有指标上的表现都最好,达到了 98%。经过微调的 BERT 模型优于其他 LLM:BERTimbau-Large 的准确率最高,为 96%,其次是 BERTimbau-Base,为 94%,BERT-Multilingual 为 87%。Bard 的表现最差,准确率为 62%,而 ChatGPT-3.5 的准确率为 81%。模型的高召回能力表明,对高危患者的误分类率较低,这对于防止专业人员错过干预措施至关重要。然而,尽管这些模型在支持自杀意念检测方面具有潜力,但它们尚未在患者监测临床环境中得到验证。因此,在将评估模型用作辅助医疗保健专业人员检测自杀意念的工具时,应谨慎使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/3b69a8414065/1678-4464-csp-40-10-EN028824-gf6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/2bba9bd04576/1678-4464-csp-40-10-EN028824-gf1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/c64b26e6d3ba/1678-4464-csp-40-10-EN028824-gf2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/10d6f404858f/1678-4464-csp-40-10-EN028824-gf3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/4ddcb249ea3c/1678-4464-csp-40-10-EN028824-gf4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/93ff473afed1/1678-4464-csp-40-10-EN028824-gf5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/3b69a8414065/1678-4464-csp-40-10-EN028824-gf6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/2bba9bd04576/1678-4464-csp-40-10-EN028824-gf1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/c64b26e6d3ba/1678-4464-csp-40-10-EN028824-gf2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/10d6f404858f/1678-4464-csp-40-10-EN028824-gf3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/4ddcb249ea3c/1678-4464-csp-40-10-EN028824-gf4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/93ff473afed1/1678-4464-csp-40-10-EN028824-gf5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2406/11654116/3b69a8414065/1678-4464-csp-40-10-EN028824-gf6.jpg

相似文献

1
Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study.基于 BERT 的和生成式大型语言模型在自杀意念检测中的比较分析:一项性能评估研究。
Cad Saude Publica. 2024 Nov 25;40(10):e00028824. doi: 10.1590/0102-311XEN028824. eCollection 2024.
2
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
3
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.
4
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
5
Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.大语言模型与专家临床医生在远程心理健康患者危机预测中的比较研究。
JMIR Ment Health. 2024 Aug 2;11:e58129. doi: 10.2196/58129.
6
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
7
Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型(ChatGPT、必应搜索和谷歌巴德)在解决生理学病例 vignettes 中的表现。
Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.
8
Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study.基础医学考试中与大语言模型准确性相关的因素:横断面研究
JMIR Med Educ. 2025 Jan 13;11:e58898. doi: 10.2196/58898.
9
A Comparative Analysis of the Performance of Large Language Models and Human Respondents in Dermatology.大语言模型与人类受试者在皮肤病学方面表现的比较分析
Indian Dermatol Online J. 2025 Feb 27;16(2):241-247. doi: 10.4103/idoj.idoj_221_24. eCollection 2025 Mar-Apr.
10
Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study.大语言模型在评估对自杀意念的适当反应方面的能力:比较研究。
J Med Internet Res. 2025 Mar 5;27:e67891. doi: 10.2196/67891.

本文引用的文献

1
Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较:ChatGPT、必应聊天和巴德
Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.
2
A systematic review on automated clinical depression diagnosis.一项关于自动化临床抑郁症诊断的系统评价。
Npj Ment Health Res. 2023 Nov 20;2(1):20. doi: 10.1038/s44184-023-00040-z.
3
Using Artificial Intelligence to Label Free-Text Operative and Ultrasound Reports for Grading Pediatric Appendicitis.利用人工智能对自由文本手术和超声报告进行标记,以对小儿阑尾炎进行分级。
J Pediatr Surg. 2024 May;59(5):783-790. doi: 10.1016/j.jpedsurg.2024.01.033. Epub 2024 Feb 2.
4
Additive effects of adjunctive app-based interventions for mental disorders - A systematic review and meta-analysis of randomised controlled trials.基于应用程序的精神障碍辅助干预措施的累加效应——随机对照试验的系统评价与荟萃分析
Internet Interv. 2023 Dec 18;35:100703. doi: 10.1016/j.invent.2023.100703. eCollection 2024 Mar.
5
A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。
NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.
6
Prediction models of suicide and non-fatal suicide attempt after discharge from a psychiatric inpatient stay: A machine learning approach on nationwide Danish registers.精神科住院患者出院后自杀和非致命性自杀未遂的预测模型:基于全国丹麦登记处的机器学习方法。
Acta Psychiatr Scand. 2023 Dec;148(6):525-537. doi: 10.1111/acps.13629. Epub 2023 Nov 14.
7
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
8
Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial.医学专业人员的新兴技能:提示工程教程
J Med Internet Res. 2023 Oct 4;25:e50638. doi: 10.2196/50638.
9
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估:案例研究
JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.
10
Empowering radiology: the transformative role of ChatGPT.赋能放射学:ChatGPT的变革性作用。
Clin Radiol. 2023 Nov;78(11):851-855. doi: 10.1016/j.crad.2023.08.006. Epub 2023 Aug 22.