Deady Matthew, Ezzeldin Hussein, Cook Kerry, Billings Douglas, Pizarro Jeno, Plotogea Amalia A, Saunders-Hastings Patrick, Belov Artur, Whitaker Barbee I, Anderson Steven A
IBM, Washington, DC, United States.
US Food and Drug Administration, Silver Spring, MD, United States.
Front Digit Health. 2021 Dec 22;3:777905. doi: 10.3389/fdgth.2021.777905. eCollection 2021.
The Food and Drug Administration Center for Biologics Evaluation and Research conducts post-market surveillance of biologic products to ensure their safety and effectiveness. Studies have found that common vaccine exposures may be missing from structured data elements of electronic health records (EHRs), instead being captured in clinical notes. This impacts monitoring of adverse events following immunizations (AEFIs). For example, COVID-19 vaccines have been regularly administered outside of traditional medical settings. We developed a natural language processing (NLP) algorithm to mine unstructured clinical notes for vaccinations not captured in structured EHR data. A random sample of 1,000 influenza vaccine administrations, representing 995 unique patients, was extracted from a large U.S. EHR database. NLP techniques were used to detect administrations from the clinical notes in the training dataset [80% ( = 797) of patients]. The algorithm was applied to the validation dataset [20% ( = 198) of patients] to assess performance. Full medical charts for 28 randomly selected administration events in the validation dataset were reviewed by clinicians. The NLP algorithm was then applied across the entire dataset ( = 995) to quantify the number of additional events identified. A total of 3,199 administrations were identified in the structured data and clinical notes combined. Of these, 2,740 (85.7%) were identified in the structured data, while the NLP algorithm identified 1,183 (37.0%) administrations in clinical notes; 459 were not also captured in the structured data. This represents a 16.8% increase in the identification of vaccine administrations compared to using structured data alone. The validation of 28 vaccine administrations confirmed 27 (96.4%) as "definite" vaccine administrations; 18 (64.3%) had evidence of a vaccination event in the structured data, while 10 (35.7%) were found solely in the unstructured notes. We demonstrated the utility of an NLP algorithm to identify vaccine administrations not captured in structured EHR data. NLP techniques have the potential to improve detection of vaccine administrations not otherwise reported without increasing the analysis burden on physicians or practitioners. Future applications could include refining estimates of vaccine coverage and detecting other exposures, population characteristics, and outcomes not reliably captured in structured EHR data.
美国食品药品监督管理局生物制品评估和研究中心对生物制品进行上市后监测,以确保其安全性和有效性。研究发现,电子健康记录(EHR)的结构化数据元素中可能缺少常见的疫苗接种记录,这些记录反而被记录在临床笔记中。这影响了对免疫接种后不良事件(AEFI)的监测。例如,新冠疫苗经常在传统医疗环境之外进行接种。我们开发了一种自然语言处理(NLP)算法,用于挖掘结构化EHR数据中未记录的非结构化临床笔记中的疫苗接种信息。从一个大型美国EHR数据库中提取了1000例流感疫苗接种的随机样本,代表995名不同患者。NLP技术用于在训练数据集中从临床笔记中检测接种情况[80%(n = 797)的患者]。该算法应用于验证数据集[20%(n = 198)的患者]以评估性能。临床医生对验证数据集中随机选择的28个接种事件的完整病历进行了审查。然后将NLP算法应用于整个数据集(n = 995),以量化识别出的额外事件数量。结构化数据和临床笔记中总共识别出3199次接种。其中,2740次(85.7%)在结构化数据中被识别,而NLP算法在临床笔记中识别出1183次(37.0%)接种;459次在结构化数据中未被记录。与仅使用结构化数据相比,这代表疫苗接种识别率提高了16.8%。对28次疫苗接种的验证确认其中27次(96.4%)为“确定”的疫苗接种;18次(64.3%)在结构化数据中有疫苗接种事件的证据,而10次(35.7%)仅在非结构化笔记中被发现。我们证明了NLP算法在识别结构化EHR数据中未记录的疫苗接种方面的实用性。NLP技术有可能在不增加医生或从业者分析负担的情况下,改善对未另行报告的疫苗接种的检测。未来的应用可能包括完善疫苗接种覆盖率估计,以及检测结构化EHR数据中未可靠记录的其他暴露、人群特征和结果。