Suppr超能文献

利用结构化和非结构化数据的概率性记录链接来识别自发不良事件报告系统中的重复病例。

Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems.

作者信息

Kreimeyer Kory, Menschik David, Winiecki Scott, Paul Wendy, Barash Faith, Woo Emily Jane, Alimchandani Meghna, Arya Deepa, Zinderman Craig, Forshee Richard, Botsis Taxiarchis

机构信息

Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993-0002, USA.

出版信息

Drug Saf. 2017 Jul;40(7):571-582. doi: 10.1007/s40264-017-0523-4.

Abstract

INTRODUCTION

Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs.

OBJECTIVE

We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS).

METHODS

In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports.

RESULTS

For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm's automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives.

CONCLUSIONS

The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.

摘要

引言

自发不良事件报告系统中的重复病例报告给医学审评人员有效开展个体和总体安全性分析带来了挑战。重复病例可能会通过生成产品-不良事件对不成比例报告的虚假信号来影响数据挖掘。

目的

我们开发了一种概率性记录链接算法,用于识别美国疫苗不良事件报告系统(VAERS)和美国食品药品监督管理局不良事件报告系统(FAERS)中的重复病例。

方法

该算法除了使用结构化字段数据外,还通过检查基于事件的健康电子记录文本挖掘系统(一种自然语言处理工具)提取的临床和时间信息,纳入了不良事件报告的非结构化叙述文本。算法的最后一个组成部分是一个新的重复置信值,它通过基于规则的经验方法计算得出,该方法寻找两个病例报告之间在多个标准上的相似性。

结果

对于VAERS,该算法识别出77%的已知重复对,精确率(或阳性预测值)为95%。对于FAERS,它识别出13%的已知重复对,精确率为100%。文本信息并未改善该算法对VAERS或FAERS的自动分类。经验性重复置信值提高了VAERS和FAERS的性能,主要是通过减少假阳性的发生。

结论

该算法被证明在识别预先链接的VAERS重复报告方面是有效的。叙述文本在自动检测评估中并非关键组成部分;然而,它对于支持美国食品药品监督管理局可能采用的半自动方法至关重要,在该局医学审评人员将对算法识别出的排名最高的报告进行一些人工审评。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验