利用弱监督在电子健康记录的进展记录中执行命名实体识别，以识别眼科检查。

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam.

机构信息

Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA.

Johns Hopkins School of Medicine, Baltimore, MD, USA.

出版信息

Int J Med Inform. 2022 Nov;167:104864. doi: 10.1016/j.ijmedinf.2022.104864. Epub 2022 Sep 16.

DOI:10.1016/j.ijmedinf.2022.104864

PMID:36179600

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9901505/

Abstract

OBJECTIVE: To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach. METHODS: A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset. RESULTS: On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451). CONCLUSION: We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.

摘要

目的：使用弱监督方法，开发深度学习模型从电子健康记录（EHR）中的临床记录中识别眼科检查成分。

方法：从一个学术中心的 EHR 中收集了一个包含 39099 份眼科记录的语料库，这些记录仅对 24 种检查实体进行了弱标记。四个基于预训练转换器的语言模型（DistilBert、BioBert、BlueBert 和 ClinicalBert）被调整为这个命名实体识别任务，并与一个基线正则表达式模型进行比较。模型在弱标记测试数据集、该数据集的人工标记样本和人工标记的独立数据集上进行评估。

结果：在弱标记测试集上，所有基于转换器的模型的召回率均大于 0.93，精度从 0.815 到 0.843 不等。基线模型的召回率较低（0.769），精度（0.682）也较低。在人工注释样本上，基线模型的召回率很高（0.962，95%CI 0.955-0.067），但各实体的精度（0.081-0.999）不同。Bert 模型的召回率从 0.771 到 0.831，精度>=0.973。在独立数据集上，BlueBert 的精度为 0.926，召回率为 0.458。基线模型的召回率（0.708，95%CI 0.674-0.738）更好，但精度（0.399，95%CI -0.352-0.451）更差。

结论：我们开发了第一个从临床记录中识别眼科检查成分的深度学习系统，利用了一种新的弱监督机会。基于转换器的模型在人工标记的标签上具有很高的精度，而基线模型的精度较低，但召回率较高。该系统可用于使用自由文本记录改进队列和特征识别。我们的弱监督方法可能有助于从许多领域的 EHR 中积累大量特定领域实体的数据集。

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

新学期，新优惠

Suppr 超能文献