Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA; Program of Computational Biology and Bioinformatics, Yale University, 300 George St, New Haven, 06511, USA.
Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
J Biomed Inform. 2023 May;141:104360. doi: 10.1016/j.jbi.2023.104360. Epub 2023 Apr 14.
Physician progress notes are frequently organized into Subjective, Objective, Assessment, and Plan (SOAP) sections. The Assessment section synthesizes information recorded in the Subjective and Objective sections, and the Plan section documents tests and treatments to narrow the differential diagnosis and manage symptoms. Classifying the relationship between the Assessment and Plan sections has been suggested to provide valuable insight into clinical reasoning. In this work, we use a novel human-in-the-loop pipeline to classify the relationships between the Assessment and Plan sections of SOAP notes as a part of the n2c2 2022 Track 3 Challenge. In particular, we use a clinical information model constructed from both the entailment logic expected from the aforementioned Challenge and the problem-oriented medical record. This information model is used to label named entities as primary and secondary problems/symptoms, events and complications in all four SOAP sections. We iteratively train separate Named Entity Recognition models and use them to annotate entities in all notes/sections. We fine-tune a downstream RoBERTa-large model to classify the Assessment-Plan relationship. We evaluate multiple language model architectures, preprocessing parameters, and methods of knowledge integration, achieving a maximum macro-F1 score of 82.31%. Our initial model achieves top-2 performance during the challenge (macro-F1: 81.52%, competitors' macro-F1 range: 74.54%-82.12%). We improved our model by incorporating post-challenge annotations (S&O sections), outperforming the top model from the Challenge. We also used Shapley additive explanations to investigate the extent of language model clinical logic, under the lens of our clinical information model. We find that the model often uses shallow heuristics and nonspecific attention when making predictions, suggesting language model knowledge integration requires further research.
医生的病程记录通常分为主观、客观、评估和计划(SOAP)部分。评估部分综合了主观和客观部分记录的信息,计划部分记录了测试和治疗方法,以缩小鉴别诊断范围并治疗症状。有人建议对评估和计划部分之间的关系进行分类,以便深入了解临床推理。在这项工作中,我们使用了一种新颖的人机交互管道,将 SOAP 记录的评估和计划部分之间的关系分类,作为 n2c2 2022 年第 3 赛道挑战赛的一部分。特别是,我们使用了一种从上述挑战赛的蕴涵逻辑和面向问题的医疗记录中构建的临床信息模型。该信息模型用于标记命名实体为主次问题/症状、所有四个 SOAP 部分的事件和并发症。我们迭代地训练独立的命名实体识别模型,并使用它们对所有笔记/部分的实体进行注释。我们微调下游的 RoBERTa-large 模型来分类评估-计划关系。我们评估了多种语言模型架构、预处理参数和知识集成方法,实现了 82.31%的最大宏 F1 得分。我们的初始模型在挑战赛期间达到了前 2 名的成绩(宏 F1:81.52%,竞争对手的宏 F1 范围:74.54%-82.12%)。我们通过整合挑战赛之后的注释(S&O 部分)改进了我们的模型,超过了挑战赛的最佳模型。我们还使用 Shapley 加法解释来研究语言模型临床逻辑的程度,从我们的临床信息模型的角度来看。我们发现,该模型在进行预测时经常使用浅层启发式和非特定注意力,这表明语言模型的知识集成需要进一步研究。