VA Salt Lake City Health Care System, US Department of Veterans Affairs, Salt Lake City, UT, United States.
Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, UT, United States.
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
Patient travel history can be crucial in evaluating evolving infectious disease events. Such information can be challenging to acquire in electronic health records, as it is often available only in unstructured text.
This study aims to assess the feasibility of annotating and automatically extracting travel history mentions from unstructured clinical documents in the Department of Veterans Affairs across disparate health care facilities and among millions of patients. Information about travel exposure augments existing surveillance applications for increased preparedness in responding quickly to public health threats.
Clinical documents related to arboviral disease were annotated following selection using a semiautomated bootstrapping process. Using annotated instances as training data, models were developed to extract from unstructured clinical text any mention of affirmed travel locations outside of the continental United States. Automated text processing models were evaluated, involving machine learning and neural language models for extraction accuracy.
Among 4584 annotated instances, 2659 (58%) contained an affirmed mention of travel history, while 347 (7.6%) were negated. Interannotator agreement resulted in a document-level Cohen kappa of 0.776. Automated text processing accuracy (F1 85.6, 95% CI 82.5-87.9) and computational burden were acceptable such that the system can provide a rapid screen for public health events.
Automated extraction of patient travel history from clinical documents is feasible for enhanced passive surveillance public health systems. Without such a system, it would usually be necessary to manually review charts to identify recent travel or lack of travel, use an electronic health record that enforces travel history documentation, or ignore this potential source of information altogether. The development of this tool was initially motivated by emergent arboviral diseases. More recently, this system was used in the early phases of response to COVID-19 in the United States, although its utility was limited to a relatively brief window due to the rapid domestic spread of the virus. Such systems may aid future efforts to prevent and contain the spread of infectious diseases.
在评估不断演变的传染病事件时,患者的旅行史可能至关重要。由于此类信息通常仅以非结构化文本形式存在,因此在电子健康记录中获取这些信息具有一定挑战性。
本研究旨在评估在不同医疗保健机构和数百万患者的退伍军人事务部非结构化临床文档中,对旅行史记录进行注释和自动提取的可行性。有关旅行暴露的信息可增强现有的监测应用程序,以快速应对公共卫生威胁,提高应对准备水平。
使用半自动引导过程选择临床文档后,对与虫媒病毒病相关的文档进行注释。使用标注实例作为训练数据,开发模型以从非结构化临床文本中提取任何确认的美国大陆以外旅行地点的提及。评估了自动化文本处理模型,包括用于提取准确性的机器学习和神经语言模型。
在 4584 个注释实例中,有 2659 个(58%)包含肯定的旅行史提及,而有 347 个(7.6%)被否定。文档级别的两位评估者间一致性 Cohen kappa 值为 0.776。自动化文本处理的准确性(F1 85.6,95%CI 82.5-87.9)和计算负担是可以接受的,因此该系统可以快速筛查公共卫生事件。
从临床文档中自动提取患者旅行史对于增强被动监测公共卫生系统是可行的。如果没有这样的系统,通常需要手动查看图表以确定最近的旅行或缺乏旅行,使用强制记录旅行史的电子健康记录,或者完全忽略此潜在信息来源。该工具的开发最初是由新兴的虫媒病毒病驱动的。最近,在美国 COVID-19 疫情早期阶段,该系统被用于响应,但由于病毒在国内迅速传播,其用途仅限于相对较短的时间窗口。此类系统可能有助于未来预防和控制传染病的传播。