Fu Sunyang, Thorsteinsdottir Bjoerg, Zhang Xin, Lopes Guilherme S, Pagali Sandeep R, LeBrasseur Nathan K, Wen Andrew, Liu Hongfang, Rocca Walter A, Olson Janet E, Sauver Jennifer St, Sohn Sunghwan
Department of AI and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; University of Minnesota, Minneapolis, MN 55455, USA.
Department of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.
Int J Med Inform. 2022 Mar 7;162:104736. doi: 10.1016/j.ijmedinf.2022.104736.
Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word "fall" is a typical homonym.
We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis.
The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories.
A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
跌倒是老年人意外伤害的主要原因。电子健康记录(EHR)为开发能够识别跌倒事件的模型提供了独特的机会。然而,在临床记录中识别跌倒事件需要先进的自然语言处理(NLP)来同时解决多个问题,因为“跌倒”这个词是典型的同音异义词。
我们实现了一种上下文感知语言模型,即来自变换器的双向编码器表示(BERT),用于从EHR文本中识别跌倒事件,并进一步将BERT模型融合到一个混合架构中,结合事后启发式规则以提高性能。这些模型在真实世界的EHR数据上进行了评估,并与传统的基于规则的模型和深度学习模型(CNN和双向长短期记忆网络(Bi-LSTM))进行了比较。为了更好地理解每种方法识别跌倒事件的能力,我们进一步对与跌倒相关的概念(即跌倒风险、跌倒预防、同音异义词)进行了分类,并进行了详细的错误分析。
混合模型在句子(0.971)、文档(0.985)和患者(0.954)级别上获得了最高的F1分数。在句子级别(模型中的基本数据单元),混合模型的灵敏度、特异性、阳性预测值和阴性预测值分别为0.954、1.000、0.988和0.999。错误分析表明,在需要上下文理解的具有挑战性的案例中,基于机器学习的方法比基于规则的方法表现出更高的性能。上下文感知语言模型(BERT)略优于在Bi-LSTM上训练的词嵌入方法。没有一个单一模型在所有与跌倒相关的语义类别上都表现出最佳性能。
上下文感知语言模型(BERT)能够识别EHR自由文本中需要上下文理解的具有挑战性的跌倒事件。结合事后规则的混合模型允许对BERT结果进行定制修正,并进一步提高了跌倒检测的性能。