Suppr超能文献

关于开发用于从医疗记录中提取自杀因素的特定领域词汇表的案例。

A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes.

作者信息

Morrow Destinee, Zamora-Resendiz Rafael, Beckham Jean C, Kimbrel Nathan A, Oslin David W, Tamang Suzanne, Crivelli Silvia

机构信息

Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, CA, 94720, USA.

Durham Veterans Affairs Health Care System, U.S. Department of Veterans Affairs, 508 Fulton Street, Durham, NC, 27705, USA.

出版信息

J Psychiatr Res. 2022 Jul;151:328-338. doi: 10.1016/j.jpsychires.2022.04.009. Epub 2022 Apr 28.

Abstract

The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.

摘要

诸如住房不稳定、工作不稳定和社交联系减少等生活事件(LE)的发生和持续存在已被证明会增加自杀风险。由于结构化电子健康记录(EHR)数据报告不足,自杀风险预测模型对其中许多因素的敏感性较低。在本研究中,我们展示了自然语言处理(NLP)如何比报告的医学编码更高效地帮助识别临床记录中的生活事件。我们将由统一医学语言系统(UMLS)选择、主题专家(SME)的内容分析以及引力项目制定的特定领域词汇表,与使用Word2Vec通过上下文词嵌入进行的数据驱动扩展进行比较。我们的分析涵盖了退伍军人事务部(VA)企业数据仓库(CDW)中的电子健康记录,并测量了国家死亡指数(NDI)中已知潜在死因患者在不同时间的生活事件患病率。我们发现,与结构化电子健康记录(S-EHR)变量相比,自然语言处理方法在检测生活事件方面具有更高的敏感性。我们观察到,与那些没有精神疾病诊断史且死于非自杀相关原因的患者相比,自杀病例随着时间推移平均有更高的生活事件发生率。当用于区分这些结果时,纳入自然语言处理衍生变量会增加生活事件在预测风险最高的0.1%、0.5%和1%中的集中度。对于有精神疾病诊断的患者,在区分自杀死亡和非自杀相关死亡时,生活事件的信息量较小。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验