Peterson Kevin J, Jiang Guoqian, Liu Hongfang
Department of Information Technology, Mayo Clinic, Rochester, MN 55905, United States; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, United States.
Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, United States; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, United States.
J Biomed Inform. 2020 Oct;110:103541. doi: 10.1016/j.jbi.2020.103541. Epub 2020 Aug 16.
Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.
自由文本问题描述是对患者诊断和问题的简要解释,常见于问题列表和病历的其他显著位置。这些简洁的表述往往表达了复杂且细微的医疗状况,使其语义难以完全捕捉和标准化。在本研究中,我们描述了一个将自由文本问题描述转换为标准化的健康级别7(HL7)快速医疗保健互操作性资源(FHIR)模型的框架。这种方法利用特定领域依存句法分析器、来自变换器的双向编码器表征(BERT)自然语言模型以及cui2vec统一医学语言系统(UMLS)概念向量的组合,将从自由文本问题描述中提取的概念与结构化的FHIR模型对齐。一个神经网络分类模型用于对概念之间的13种关系类型进行分类,便于映射到FHIR病情资源。我们使用数据编程(一种弱监督方法)来消除对手动标注训练语料库的需求。夏普利值(一种量化贡献的机制)用于解释模型特征的影响。我们发现,我们的方法识别出问题描述的重点概念或主要临床关注点的F分数为0.95。从重点概念到其他修饰概念的关系提取的F分数为0.90。在对关系进行分类时,我们的模型实现了0.89的加权平均F分数,能够将属性准确映射到HL7 FHIR模型中。我们还发现,如夏普利值分析所示,BERT输入表征对分类器决策的贡献最大。