Department of Artificial Intelligence and Informatics, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, USA.
Precision Population Science Lab, Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN, USA.
BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):272. doi: 10.1186/s12911-021-01633-4.
BACKGROUND: There are significant variabilities in guideline-concordant documentation in asthma care. However, assessing clinician's documentation is not feasible using only structured data but requires labor-intensive chart review of electronic health records (EHRs). A certain guideline element in asthma control factors, such as review inhaler techniques, requires context understanding to correctly capture from EHR free text. METHODS: The study data consist of two sets: (1) manual chart reviewed data-1039 clinical notes of 300 patients with asthma diagnosis, and (2) weakly labeled data (distant supervision)-27,363 clinical notes from 800 patients with asthma diagnosis. A context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) was developed to identify inhaler techniques in EHR free text. Both original BERT and clinical BioBERT (cBERT) were applied with a cost-sensitivity to deal with imbalanced data. The distant supervision using weak labels by rules was also incorporated to augment the training set and alleviate a costly manual labeling process in the development of a deep learning algorithm. A hybrid approach using post-hoc rules was also explored to fix BERT model errors. The performance of BERT with/without distant supervision, hybrid, and rule-based models were compared in precision, recall, F-score, and accuracy. RESULTS: The BERT models on the original data performed similar to a rule-based model in F1-score (0.837, 0.845, and 0.838 for rules, BERT, and cBERT, respectively). The BERT models with distant supervision produced higher performance (0.853 and 0.880 for BERT and cBERT, respectively) than without distant supervision and a rule-based model. The hybrid models performed best in F1-score of 0.877 and 0.904 over the distant supervision on BERT and cBERT. CONCLUSIONS: The proposed BERT models with distant supervision demonstrated its capability to identify inhaler techniques in EHR free text, and outperformed both the rule-based model and BERT models trained on the original data. With a distant supervision approach, we may alleviate costly manual chart review to generate the large training data required in most deep learning-based models. A hybrid model was able to fix BERT model errors and further improve the performance.
背景:在哮喘护理中,指南一致的文件记录存在显著差异。然而,仅使用结构化数据评估临床医生的文件记录是不可行的,需要对电子健康记录(EHR)进行繁琐的图表审查。哮喘控制因素中的某些指南要素,例如审查吸入器技术,需要上下文理解才能从 EHR 自由文本中正确捕获。
方法:研究数据包括两组:(1)手动图表审查数据-300 名哮喘诊断患者的 1039 份临床记录,(2)弱标记数据(远程监督)-800 名哮喘诊断患者的 27363 份临床记录。开发了一种基于上下文感知的语言模型,即双向编码器表示转换器(BERT),用于从 EHR 自由文本中识别吸入器技术。原始 BERT 和临床生物 BERT(cBERT)都应用了成本敏感性来处理不平衡数据。还采用基于规则的弱标签远程监督来扩充训练集,并减轻深度学习算法开发中的昂贵手动标记过程。还探索了一种使用事后规则的混合方法来修复 BERT 模型错误。比较了带有/不带有远程监督、混合和基于规则的模型的 BERT 在精度、召回率、F 分数和准确性方面的性能。
结果:原始数据上的 BERT 模型在 F1 分数方面的表现与基于规则的模型相似(规则、BERT 和 cBERT 的 F1 分数分别为 0.837、0.845 和 0.838)。带有远程监督的 BERT 模型产生了更高的性能(BERT 和 cBERT 的 F1 分数分别为 0.853 和 0.880),优于无远程监督和基于规则的模型。混合模型在 F1 分数方面表现最佳,分别为 0.877 和 0.904,优于 BERT 和 cBERT 上的远程监督。
结论:提出的带有远程监督的 BERT 模型证明了其在 EHR 自由文本中识别吸入器技术的能力,并优于基于规则的模型和在原始数据上训练的 BERT 模型。通过远程监督方法,我们可以减轻昂贵的手动图表审查,生成大多数基于深度学习的模型所需的大量训练数据。混合模型能够修复 BERT 模型错误,并进一步提高性能。
BMC Med Inform Decis Mak. 2021-11-9
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020
BMC Med Inform Decis Mak. 2022-7-7
Int J Med Inform. 2022-3-7
Int J Med Inform. 2019-10-2
Transl Vis Sci Technol. 2022-3-2
Circ Arrhythm Electrophysiol. 2025-1
Nat Lang Eng. 2024-9
JMIR Med Inform. 2024-10-21
Int Braz J Urol. 2024
Artif Intell Med. 2021-2
AMIA Annu Symp Proc. 2020-3-4
J Allergy Clin Immunol. 2019-12-26
Bioinformatics. 2020-2-15
J Am Med Inform Assoc. 2019-11-1
BMC Med Inform Decis Mak. 2019-1-7
IEEE J Biomed Health Inform. 2017-10-27