Suppr超能文献

利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

机构信息

Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Perelman School of Medicine, University of Pennsylvania, PA, USA.

出版信息

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Abstract

OBJECTIVE

Social determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes.

DESIGN

A retrospective cohort study.

SETTING AND PARTICIPANTS

Data were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes.

METHODS

We developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the "Social Work" category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures - convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) - for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC).

RESULTS

All three DNN models accurately classified all SDOH categories (minimum micro-F1 = 0.632, minimum macro-AUC = 0.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC = 0.907). The BERT model most effectively identified the "occupational" category (F1 = 0.774, AUC = 0.965) and least effectively identified the "non-SDOH" category (F = 0.491, AUC = 0.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = 0.87 vs. cTAKES F1 = 0.06), and outperformed logistic regression (micro-F1 = 0.649, macro-AUC = 0.696) and random forest (micro-F1 = 0.502, macro-AUC = 0.523) trained on bag-of-words.

CONCLUSIONS

Our study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.

摘要

目的

社会决定因素(SDOH)是非医疗因素,可深刻影响患者的健康结果。然而,SDOH 很少存在于结构化电子健康记录(EHR)数据中,例如诊断代码,而更多地存在于非结构化的临床记录中。因此,从非结构化的 EHR 数据中识别社会背景变得越来越重要。然而,以前使用自然语言处理来自动化提取文本中的 SDOH 的工作(a)通常侧重于专门选择的 SDOH,(b)没有使用深度学习的最新进展。我们的目标是通过(a)基于标准生物医学和精神病学本体系统地创建一组 SDOH,以及(b)训练最先进的深度神经网络来从临床记录中提取这些 SDOH 的提及,从而从临床文本中自动提取 SDOH。

设计

回顾性队列研究。

设置和参与者

数据从医疗信息集市重症监护(MIMIC-III)数据库中提取。语料库由 2670 份临床记录中的 3504 个社会相关句子组成。

方法

我们开发了一种用于自动分类多个 SDOH 类别的框架。我们的数据集由 MIMIC-III 临床数据库“社会工作”类别下的叙述性临床记录组成。使用标准术语 SNOMED-CT 和 DSM-IV,我们系统地整理了一组 13 个 SDOH 类别,并为这些类别创建了注释指南。在手动注释了 3504 个句子后,我们开发并测试了三个深度神经网络(DNN)架构 - 卷积神经网络(CNN)、长短时记忆(LSTM)网络和双向转换器表示的 BERT - 用于自动检测八个 SDOH 类别。我们还将这些 DNN 与三个基线模型进行了比较:(1)cTAKES,以及(2)L2 正则化逻辑回归和(3)基于单词袋的随机森林。模型评估指标包括微观和宏观 F1 和接收器操作特征曲线下的面积(AUC)。

结果

所有三个 DNN 模型都准确地对所有 SDOH 类别进行了分类(最小微观 F1 = 0.632,最小宏观 AUC = 0.854)。与 CNN 和 LSTM 相比,BERT 在大多数关键指标上表现最佳(微观 F1 = 0.690,宏观 AUC = 0.907)。BERT 模型最有效地识别了“职业”类别(F1 = 0.774,AUC = 0.965),而最有效地识别了“非 SDOH”类别(F = 0.491,AUC = 0.788)。BERT 在区分社会与非社会句子方面优于 cTAKES(BERT F1 = 0.87 与 cTAKES F1 = 0.06),并且优于逻辑回归(微观 F1 = 0.649,宏观 AUC = 0.696)和基于单词袋的随机森林(微观 F1 = 0.502,宏观 AUC = 0.523)。

结论

我们的研究框架与 DNN 模型相结合,展示了从 EHR 中的临床记录中高效识别系统范围的 SDOH 类别的改进性能。对患者 SDOH 的识别能力提高可能会进一步改善医疗保健结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验