使用人工智能从医疗记录中的自由文本预测新冠病毒疾病症状：可行性研究

Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study.

作者信息

Van Olmen Josefien, Van Nooten Jens, Philips Hilde, Sollie Annet, Daelemans Walter

机构信息

Department of Family Medicine and Population Health, University of Antwerp, Antwerp, Belgium.

Computational Linguistics, Psycholinguistics and Sociolinguistics Research Centre, University of Antwerp, Antwerp, Belgium.

出版信息

JMIR Med Inform. 2022 Apr 27;10(4):e37771. doi: 10.2196/37771.

DOI:10.2196/37771

PMID:35442903

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9049643/

Abstract

BACKGROUND

Electronic medical records have opened opportunities to analyze clinical practice at large scale. Structured registries and coding procedures such as the International Classification of Primary Care further improved these procedures. However, a large part of the information about the state of patient and the doctors' observations is still entered in free text fields. The main function of those fields is to report the doctor's line of thought, to remind oneself and his or her colleagues on follow-up actions, and to be accountable for clinical decisions. These fields contain rich information that can be complementary to that in coded fields, and until now, they have been hardly used for analysis.

OBJECTIVE

This study aims to develop a prediction model to convert the free text information on COVID-19-related symptoms from out of hours care electronic medical records into usable symptom-based data that can be analyzed at large scale.

METHODS

The design was a feasibility study in which we examined the content of the raw data, steps and methods for modelling, as well as the precision and accuracy of the models. A data prediction model for 27 preidentified COVID-19-relevant symptoms was developed for a data set derived from the database of primary-care out-of-hours consultations in Flanders. A multiclass, multilabel categorization classifier was developed. We tested two approaches, which were (1) a classical machine learning-based text categorization approach, Binary Relevance, and (2) a deep neural network learning approach with BERTje, including a domain-adapted version. Ethical approval was acquired through the Institutional Review Board of the Institute of Tropical Medicine and the ethics committee of the University Hospital of Antwerpen (ref 20/50/693).

RESULTS

The sample set comprised 3957 fields. After cleaning, 2313 could be used for the experiments. Of the 2313 fields, 85% (n=1966) were used to train the model, and 15% (n=347) for testing. The normal BERTje model performed the best on the data. It reached a weighted F1 score of 0.70 and an exact match ratio or accuracy score of 0.38, indicating the instances for which the model has identified all correct codes. The other models achieved respectable results as well, ranging from 0.59 to 0.70 weighted F1. The Binary Relevance method performed the best on the data without a frequency threshold. As for the individual codes, the domain-adapted version of BERTje performs better on several of the less common objective codes, while BERTje reaches higher F1 scores for the least common labels especially, and for most other codes in general.

CONCLUSIONS

The artificial intelligence model BERTje can reliably predict COVID-19-related information from medical records using text mining from the free text fields generated in primary care settings. This feasibility study invites researchers to examine further possibilities to use primary care routine data.

摘要

背景

电子病历为大规模分析临床实践提供了机会。结构化登记系统和编码程序，如国际初级保健分类，进一步完善了这些程序。然而，关于患者状况和医生观察结果的大部分信息仍记录在自由文本字段中。这些字段的主要功能是记录医生的思路，提醒自己和同事后续行动，并对临床决策负责。这些字段包含丰富的信息，可以补充编码字段中的信息，而到目前为止，它们几乎未被用于分析。

目的

本研究旨在开发一种预测模型，将非工作时间护理电子病历中关于COVID-19相关症状的自由文本信息转换为可大规模分析的基于症状的可用数据。

方法

本设计为一项可行性研究，我们检查了原始数据的内容、建模步骤和方法，以及模型的精度和准确性。针对从佛兰德初级保健非工作时间咨询数据库导出的数据集，开发了一个针对27种预先确定的与COVID-19相关症状的数据预测模型。开发了一种多类、多标签分类分类器。我们测试了两种方法，即(1)基于经典机器学习的文本分类方法二元相关性，以及(2)使用BERTje的深度神经网络学习方法，包括一个领域适应版本。通过热带医学研究所的机构审查委员会和安特卫普大学医院的伦理委员会获得了伦理批准（参考号20/50/693）。

结果

样本集包括3957个字段。清理后，2313个可用于实验。在这2313个字段中，85%（n = 1966）用于训练模型，15%（n = 347）用于测试。正常的BERTje模型在数据上表现最佳。它的加权F1分数达到0.70，精确匹配率或准确率分数达到0.38，表明模型识别出所有正确代码的实例。其他模型也取得了不错的结果，加权F1分数在0.59至0.70之间。二元相关性方法在无频率阈值的数据上表现最佳。对于各个代码，BERTje的领域适应版本在一些不太常见的客观代码上表现更好，而BERTje尤其在最不常见的标签以及大多数其他代码上达到了更高的F1分数。