Oklahoma State University, Stillwater, OK, United States.
National Institutes of Health Clinical Center, Bethesda, MD, United States.
Int J Med Inform. 2021 Mar;147:104351. doi: 10.1016/j.ijmedinf.2020.104351. Epub 2020 Dec 24.
Secondary use of Electronic Health Records (EHRs) has mostly focused on health conditions (diseases and drugs). Function is an important health indicator in addition to morbidity and mortality. Nevertheless, function has been overlooked in accessing patients' health status. The World Health Organization (WHO)'s International Classification of Functioning, Disability and Health (ICF) is considered the international standard for describing and coding function and health states. We pioneer the first comprehensive analysis and identification of functioning concepts in the Mobility domain of the ICF.
Using physical therapy notes at the National Institutes of Health's Clinical Center, we induced a hierarchical order of mobility-related entities including 5 entities types, 3 relations, 8 attributes, and 33 attribute values. Two domain experts manually curated a gold standard corpus of 14,281 nested entity mentions from 400 clinical notes. Inter-annotator agreement (IAA) of exact matching averaged 92.3 % F1-score on mention text spans, and 96.6 % Cohen's kappa on attributes assignments. A high-performance Ensemble machine learning model for named entity recognition (NER) was trained and evaluated using the gold standard corpus. Average F1-score on exact entity matching of our Ensemble method (84.90 %) outperformed popular NER methods: Conditional Random Field (80.4 %), Recurrent Neural Network (81.82 %), and Bidirectional Encoder Representations from Transformers (82.33 %).
The results of this study show that mobility functioning information can be reliably captured from clinical notes once adequate resources are provided for sequence labeling methods. We expect that functioning concepts in other domains of the ICF can be identified in similar fashion.
电子健康记录(EHRs)的二次利用主要集中在健康状况(疾病和药物)上。功能是除了发病率和死亡率之外的一个重要健康指标。然而,在评估患者的健康状况时,功能却被忽视了。世界卫生组织(WHO)的《国际功能、残疾和健康分类》(ICF)被认为是描述和编码功能和健康状况的国际标准。我们率先对 ICF 的活动领域中的功能概念进行了全面的分析和识别。
利用美国国立卫生研究院临床中心的物理治疗记录,我们归纳出了与活动相关的实体的层次结构,包括 5 种实体类型、3 种关系、8 种属性和 33 种属性值。两位领域专家手动整理了 400 份临床记录中的 14281 个嵌套实体提及的黄金标准语料库。提及文本跨度的精确匹配的标注者间一致性(IAA)平均为 92.3% F1 分数,属性分配的 Cohen's kappa 为 96.6%。使用黄金标准语料库对命名实体识别(NER)的高性能集成机器学习模型进行了训练和评估。我们的集成方法在精确实体匹配方面的平均 F1 分数(84.90%)优于流行的 NER 方法:条件随机场(80.4%)、递归神经网络(81.82%)和双向转换器编码器表示(82.33%)。
这项研究的结果表明,一旦为序列标注方法提供了足够的资源,就可以从临床记录中可靠地提取活动功能信息。我们期望以类似的方式识别 ICF 其他领域的功能概念。