Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
MindCORE and Cognitive Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
J Am Med Inform Assoc. 2023 Jul 19;30(8):1379-1388. doi: 10.1093/jamia/ocad046.
Social determinants of health (SDOH) are nonclinical, socioeconomic conditions that influence patient health and quality of life. Identifying SDOH may help clinicians target interventions. However, SDOH are more frequently available in narrative notes compared to structured electronic health records. The 2022 n2c2 Track 2 competition released clinical notes annotated for SDOH to promote development of NLP systems for extracting SDOH. We developed a system addressing 3 limitations in state-of-the-art SDOH extraction: the inability to identify multiple SDOH events of the same type per sentence, overlapping SDOH attributes within text spans, and SDOH spanning multiple sentences.
We developed and evaluated a 2-stage architecture. In stage 1, we trained a BioClinical-BERT-based named entity recognition system to extract SDOH event triggers, that is, text spans indicating substance use, employment, or living status. In stage 2, we trained a multitask, multilabel NER to extract arguments (eg, alcohol "type") for events extracted in stage 1. Evaluation was performed across 3 subtasks differing by provenance of training and validation data using precision, recall, and F1 scores.
When trained and validated on data from the same site, we achieved 0.87 precision, 0.89 recall, and 0.88 F1. Across all subtasks, we ranked between second and fourth place in the competition and always within 0.02 F1 from first.
Our 2-stage, deep-learning-based NLP system effectively extracted SDOH events from clinical notes. This was achieved with a novel classification framework that leveraged simpler architectures compared to state-of-the-art systems. Improved SDOH extraction may help clinicians improve health outcomes.
社会决定因素健康(SDOH)是非临床的社会经济条件,影响患者的健康和生活质量。确定 SDOH 可以帮助临床医生确定干预目标。然而,与结构化电子健康记录相比,SDOH 更频繁地出现在叙述性记录中。2022 年 n2c2 第 2 轨道竞赛发布了标注有 SDOH 的临床记录,以促进用于提取 SDOH 的自然语言处理(NLP)系统的开发。我们开发了一个系统,解决了最先进的 SDOH 提取中的 3 个限制:无法识别句子中相同类型的多个 SDOH 事件,文本跨度内重叠的 SDOH 属性,以及跨越多个句子的 SDOH。
我们开发并评估了一个两阶段架构。在第 1 阶段,我们训练了一个基于 BioClinical-BERT 的命名实体识别系统,以提取 SDOH 事件触发器,即指示物质使用、就业或生活状况的文本跨度。在第 2 阶段,我们训练了一个多任务、多标签 NER,以提取第 1 阶段提取的事件的参数(例如,酒精“类型”)。使用精度、召回率和 F1 分数在跨 3 个子任务进行评估,这些子任务的训练和验证数据的来源不同。
当在同一站点的数据上进行训练和验证时,我们实现了 0.87 的精度、0.89 的召回率和 0.88 的 F1。在所有子任务中,我们在竞赛中排名第二至第四位,并且始终与第一位相差 0.02 F1。
我们的基于深度学习的两阶段 NLP 系统有效地从临床记录中提取了 SDOH 事件。这是通过一种新颖的分类框架实现的,该框架利用了比最先进系统更简单的架构。改善 SDOH 提取可能有助于临床医生改善健康结果。