Department of Health Outcomes and Biomedical Informatics.
Cancer Informatics Shared Resources, University of Florida Health Cancer Center, University of Florida, Gainesville, Florida, USA.
AMIA Annu Symp Proc. 2022 Feb 21;2021:1225-1233. eCollection 2021.
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models. However, there are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record (EHR) systems, while much of the SBDoH information is documented in clinical narratives. Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text. However, there is not a mature clinical NLP system focusing on SBDoH. In this study, we examined two state-of-the-art transformer-based NLP models, including BERT and RoBERTa, to extract SBDoH concepts from clinical narratives, applied the best performing model to extract SBDoH concepts on a lung cancer screening patient cohort, and examined the difference of SBDoH information between NLP extracted results and structured EHRs (SBDoH information captured in standard vocabularies such as the International Classification of Diseases codes). The experimental results show that the BERT-based NLP model achieved the best strict/lenient F1-score of 0.8791 and 0.8999, respectively. The comparison between NLP extracted SBDoH information and structured EHRs in the lung cancer patient cohort of 864 patients with 161,933 various types of clinical notes showed that much more detailed information about smoking, education, and employment were only captured in clinical narratives and that it is necessary to use both clinical narratives and structured EHRs to construct a more complete picture of patients' SBDoH factors.
社会和行为决定因素(Social and behavioral determinants of health,SBDoH)对塑造人们的健康起着重要作用。在临床研究中,尤其是在比较疗效研究中,如果不调整 SBDoH 因素,将在统计分析和基于机器学习的模型中造成混淆问题和分类错误。然而,由于当前电子健康记录(Electronic health record,EHR)系统中缺乏结构化的 SBDoH 信息,以及大部分 SBDoH 信息都记录在临床叙述中,因此很少有研究关注临床结局中的 SBDoH 因素。自然语言处理(Natural language processing,NLP)是从非结构化临床文本中提取此类信息的关键技术。然而,目前还没有一个成熟的专注于 SBDoH 的临床 NLP 系统。在这项研究中,我们检查了两种最先进的基于转换器的 NLP 模型,包括 BERT 和 RoBERTa,以从临床叙述中提取 SBDoH 概念,应用表现最好的模型从一个肺癌筛查患者队列中提取 SBDoH 概念,并检查了 NLP 提取结果与结构化 EHR(使用国际疾病分类代码等标准词汇表捕获的 SBDoH 信息)之间的 SBDoH 信息差异。实验结果表明,基于 BERT 的 NLP 模型分别达到了最佳的严格/宽松 F1 得分为 0.8791 和 0.8999。在 864 名肺癌患者队列中,对 161933 种不同类型的临床笔记进行的比较显示,NLP 提取的 SBDoH 信息与结构化 EHRs 之间,仅在临床叙述中才能捕捉到更多关于吸烟、教育和就业的详细信息,因此有必要同时使用临床叙述和结构化 EHRs,以构建更完整的患者 SBDoH 因素图景。