Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
Johns Hopkins School of Medicine, Baltimore, MD, USA.
Int J Med Inform. 2022 Nov;167:104864. doi: 10.1016/j.ijmedinf.2022.104864. Epub 2022 Sep 16.
OBJECTIVE: To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach. METHODS: A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset. RESULTS: On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451). CONCLUSION: We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.
目的:使用弱监督方法,开发深度学习模型从电子健康记录(EHR)中的临床记录中识别眼科检查成分。
方法:从一个学术中心的 EHR 中收集了一个包含 39099 份眼科记录的语料库,这些记录仅对 24 种检查实体进行了弱标记。四个基于预训练转换器的语言模型(DistilBert、BioBert、BlueBert 和 ClinicalBert)被调整为这个命名实体识别任务,并与一个基线正则表达式模型进行比较。模型在弱标记测试数据集、该数据集的人工标记样本和人工标记的独立数据集上进行评估。
结果:在弱标记测试集上,所有基于转换器的模型的召回率均大于 0.93,精度从 0.815 到 0.843 不等。基线模型的召回率较低(0.769),精度(0.682)也较低。在人工注释样本上,基线模型的召回率很高(0.962,95%CI 0.955-0.067),但各实体的精度(0.081-0.999)不同。Bert 模型的召回率从 0.771 到 0.831,精度>=0.973。在独立数据集上,BlueBert 的精度为 0.926,召回率为 0.458。基线模型的召回率(0.708,95%CI 0.674-0.738)更好,但精度(0.399,95%CI -0.352-0.451)更差。
结论:我们开发了第一个从临床记录中识别眼科检查成分的深度学习系统,利用了一种新的弱监督机会。基于转换器的模型在人工标记的标签上具有很高的精度,而基线模型的精度较低,但召回率较高。该系统可用于使用自由文本记录改进队列和特征识别。我们的弱监督方法可能有助于从许多领域的 EHR 中积累大量特定领域实体的数据集。