Bittar André, Velupillai Sumithra, Roberts Angus, Dutta Rina
Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
South London and Maudsley NHS Foundation Trust, London, United Kingdom.
JMIR Med Inform. 2021 Apr 13;9(4):e22397. doi: 10.2196/22397.
Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians' subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora.
This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment.
The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score.
The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful.
Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non-suicide-related EHR texts.
自杀是一个严重的公共卫生问题,占全球所有死亡人数的1.4%。据报道,目前的风险评估工具在预测自杀方面的表现仅略优于随机猜测。研究电子健康记录(EHR)动态特征的新方法正在不断探索。其中一条研究途径是使用情感分析来检查临床医生在报告患者情况时的主观判断。最近的几项研究使用通用情感分析工具自动识别EHR中的负面和正面词汇,以测试从文本中提取的情感与特定医疗结果(如自杀风险或住院死亡率)之间的相关性。然而,在将通用情感词典应用于EHR语料库时,很少有人关注对所识别的特定词汇进行分析。
本研究旨在对六个通用情感词典在EHR文本语料库上的覆盖范围进行定量和定性评估,以确定此类词汇资源适用于自杀风险评估的程度。
本研究的数据是一个包含198451篇EHR文本的语料库,该语料库由两个子语料库组成,取自一项1:4病例对照研究,该研究将自杀未遂前一段时间内撰写的临床记录(病例组,n = 2913)与未发生自杀未遂的临床记录(对照组,n = 14727)进行比较。我们计算了每个子语料库中的词频分布,以确定病例组和对照组子语料库的代表性关键词。我们根据加权精度、召回率和F分数,对这6个词典相对于该代表性关键词列表的相对覆盖范围进行了量化。
这六个词典的精度合理(0.53 - 0.68),但召回率非常低(0.04 - 0.36)。自杀相关(病例组)子语料库中的许多最具代表性的关键词没有被任何一个词典识别出来。因此,这些关键词在此用例中的情感承载状态值得怀疑。
我们的研究结果表明,这6个情感词典不适用于自杀风险评估。我们提出了一套指南,用于创建更合适的词汇资源,以区分与自杀相关和与非自杀相关的EHR文本。