• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用通用情感词典进行电子健康记录中的自杀风险评估:基于语料库的分析。

Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis.

作者信息

Bittar André, Velupillai Sumithra, Roberts Angus, Dutta Rina

机构信息

Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.

South London and Maudsley NHS Foundation Trust, London, United Kingdom.

出版信息

JMIR Med Inform. 2021 Apr 13;9(4):e22397. doi: 10.2196/22397.

DOI:10.2196/22397
PMID:33847595
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8080148/
Abstract

BACKGROUND

Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians' subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora.

OBJECTIVE

This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment.

METHODS

The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score.

RESULTS

The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful.

CONCLUSIONS

Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non-suicide-related EHR texts.

摘要

背景

自杀是一个严重的公共卫生问题,占全球所有死亡人数的1.4%。据报道,目前的风险评估工具在预测自杀方面的表现仅略优于随机猜测。研究电子健康记录(EHR)动态特征的新方法正在不断探索。其中一条研究途径是使用情感分析来检查临床医生在报告患者情况时的主观判断。最近的几项研究使用通用情感分析工具自动识别EHR中的负面和正面词汇,以测试从文本中提取的情感与特定医疗结果(如自杀风险或住院死亡率)之间的相关性。然而,在将通用情感词典应用于EHR语料库时,很少有人关注对所识别的特定词汇进行分析。

目的

本研究旨在对六个通用情感词典在EHR文本语料库上的覆盖范围进行定量和定性评估,以确定此类词汇资源适用于自杀风险评估的程度。

方法

本研究的数据是一个包含198451篇EHR文本的语料库,该语料库由两个子语料库组成,取自一项1:4病例对照研究,该研究将自杀未遂前一段时间内撰写的临床记录(病例组,n = 2913)与未发生自杀未遂的临床记录(对照组,n = 14727)进行比较。我们计算了每个子语料库中的词频分布,以确定病例组和对照组子语料库的代表性关键词。我们根据加权精度、召回率和F分数,对这6个词典相对于该代表性关键词列表的相对覆盖范围进行了量化。

结果

这六个词典的精度合理(0.53 - 0.68),但召回率非常低(0.04 - 0.36)。自杀相关(病例组)子语料库中的许多最具代表性的关键词没有被任何一个词典识别出来。因此,这些关键词在此用例中的情感承载状态值得怀疑。

结论

我们的研究结果表明,这6个情感词典不适用于自杀风险评估。我们提出了一套指南,用于创建更合适的词汇资源,以区分与自杀相关和与非自杀相关的EHR文本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9764/8080148/20f7d4e41944/medinform_v9i4e22397_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9764/8080148/20f7d4e41944/medinform_v9i4e22397_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9764/8080148/20f7d4e41944/medinform_v9i4e22397_fig1.jpg

相似文献

1
Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis.利用通用情感词典进行电子健康记录中的自杀风险评估:基于语料库的分析。
JMIR Med Inform. 2021 Apr 13;9(4):e22397. doi: 10.2196/22397.
2
Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora.从未标注语料库中诱导特定领域情感词典。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:595-605. doi: 10.18653/v1/D16-1057.
3
Evaluating the Applicability of Existing Lexicon-Based Sentiment Analysis Techniques on Family Medicine Resident Feedback Field Notes: Retrospective Cohort Study.评估现有基于词典的情感分析技术在家庭医学住院医师反馈现场记录中的适用性:回顾性队列研究。
JMIR Med Educ. 2023 Jul 27;9:e41953. doi: 10.2196/41953.
4
Identifying features of risk periods for suicide attempts using document frequency and language use in electronic health records.利用电子健康记录中的文档频率和语言使用情况识别自杀未遂风险期的特征。
Front Psychiatry. 2023 Dec 11;14:1217649. doi: 10.3389/fpsyt.2023.1217649. eCollection 2023.
5
Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.利用电子心理健康记录的自然语言处理进行住院法医精神病学环境中的风险预测。
J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.
6
BengSentiLex and BengSwearLex: creating lexicons for sentiment analysis and profanity detection in low-resource Bengali language.孟加拉语情感词典和孟加拉语脏话词典:为低资源孟加拉语的情感分析和亵渎检测创建词汇表。
PeerJ Comput Sci. 2021 Nov 16;7:e681. doi: 10.7717/peerj-cs.681. eCollection 2021.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Sentiment analysis in medical settings: New opportunities and challenges.医疗环境中的情感分析:新的机遇和挑战。
Artif Intell Med. 2015 May;64(1):17-27. doi: 10.1016/j.artmed.2015.03.006. Epub 2015 May 1.
9
Sentiment analysis of clinical narratives: A scoping review.临床叙事的情感分析:一项范围综述。
J Biomed Inform. 2023 Apr;140:104336. doi: 10.1016/j.jbi.2023.104336. Epub 2023 Mar 22.
10
Prediction of Suicide Attempts Using Clinician Assessment, Patient Self-report, and Electronic Health Records.使用临床评估、患者自我报告和电子健康记录预测自杀企图。
JAMA Netw Open. 2022 Jan 4;5(1):e2144373. doi: 10.1001/jamanetworkopen.2021.44373.

引用本文的文献

1
Changes in Mental State for Help-Seekers of Lifeline Australia's Online Chat Service: Lexical Analysis Approach.澳大利亚生命线在线聊天服务求助者心理状态的变化:词汇分析方法
JMIR Form Res. 2025 Jun 20;9:e63257. doi: 10.2196/63257.
2
A highly scalable deep learning language model for common risks prediction among psychiatric inpatients.一种用于预测精神科住院患者常见风险的高度可扩展深度学习语言模型。
BMC Med. 2025 May 28;23(1):308. doi: 10.1186/s12916-025-04150-7.
3
Identifying features of risk periods for suicide attempts using document frequency and language use in electronic health records.

本文引用的文献

1
Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior.用于预测和预防自杀行为的风险评估工具及数据驱动方法。
Front Psychiatry. 2019 Feb 13;10:36. doi: 10.3389/fpsyt.2019.00036. eCollection 2019.
2
Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness.危重症患者就诊记录文本中 6 种情感分析方法的构建有效性。
J Biomed Inform. 2019 Jan;89:114-121. doi: 10.1016/j.jbi.2018.12.001. Epub 2018 Dec 14.
3
Temporal and spatiotemporal investigation of tourist attraction visit sentiment on Twitter.
利用电子健康记录中的文档频率和语言使用情况识别自杀未遂风险期的特征。
Front Psychiatry. 2023 Dec 11;14:1217649. doi: 10.3389/fpsyt.2023.1217649. eCollection 2023.
4
Evaluating the Applicability of Existing Lexicon-Based Sentiment Analysis Techniques on Family Medicine Resident Feedback Field Notes: Retrospective Cohort Study.评估现有基于词典的情感分析技术在家庭医学住院医师反馈现场记录中的适用性:回顾性队列研究。
JMIR Med Educ. 2023 Jul 27;9:e41953. doi: 10.2196/41953.
基于 Twitter 的旅游景点访问情绪的时间和时空调查。
PLoS One. 2018 Jun 14;13(6):e0198857. doi: 10.1371/journal.pone.0198857. eCollection 2018.
4
Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora.从未标注语料库中诱导特定领域情感词典。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:595-605. doi: 10.18653/v1/D16-1057.
5
Automatic Construction and Global Optimization of a Multisentiment Lexicon.多情感词典的自动构建与全局优化
Comput Intell Neurosci. 2016;2016:2093406. doi: 10.1155/2016/2093406. Epub 2016 Nov 29.
6
Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research.自杀意念和行为的风险因素:50 年研究的荟萃分析。
Psychol Bull. 2017 Feb;143(2):187-232. doi: 10.1037/bul0000084. Epub 2016 Nov 14.
7
Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing.利用自然语言处理技术提高综合医院出院后自杀和意外死亡的预测能力。
JAMA Psychiatry. 2016 Oct 1;73(10):1064-1071. doi: 10.1001/jamapsychiatry.2016.2172.
8
Suicide Attempt as a Risk Factor for Completed Suicide: Even More Lethal Than We Knew.自杀未遂作为自杀死亡的一个风险因素:比我们所知的更具致命性。
Am J Psychiatry. 2016 Nov 1;173(11):1094-1100. doi: 10.1176/appi.ajp.2016.15070854. Epub 2016 Aug 13.
9
Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource.南伦敦和莫兹利国民保健服务基金会信托生物医学研究中心(SLaM BRC)病例登记册的队列概况:源自电子心理健康记录的数据资源的现状及近期改进
BMJ Open. 2016 Mar 1;6(3):e008721. doi: 10.1136/bmjopen-2015-008721.
10
You Are What You Tweet: Connecting the Geographic Variation in America's Obesity Rate to Twitter Content.人如其言:将美国肥胖率的地理差异与推特内容联系起来。
PLoS One. 2015 Sep 2;10(9):e0133505. doi: 10.1371/journal.pone.0133505. eCollection 2015.