在疫苗不良事件报告系统(VAERS)数据库上比较数据挖掘方法。
Comparing data mining methods on the VAERS database.
作者信息
Banks David, Woo Emily Jane, Burwen Dale R, Perucci Phil, Braun M Miles, Ball Robert
机构信息
The Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, MD 20852, USA.
出版信息
Pharmacoepidemiol Drug Saf. 2005 Sep;14(9):601-9. doi: 10.1002/pds.1107.
PURPOSE
Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the proportion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting System (VAERS) contains approximately 150 000 reports of adverse events that are possibly associated with vaccine administration.
METHODS
We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower-bound of the EBGM's 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these to the VAERS database and compared the agreement among methods and other performance properties, particularly focusing on the vaccine-event combinations with the highest numerical scores in the various methods.
RESULTS
The vaccine-event combinations with the highest numerical scores varied substantially among the methods. Not all combinations representing known associations appeared in the top 100 vaccine-event pairs for all methods.
CONCLUSIONS
The four methods differ in their ranking of vaccine-COSTART pairs. A given method may be superior in certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Determining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm rates using known vaccine-event associations. Evaluating the properties of these data mining methods will help determine the value of such methods in vaccine safety surveillance.
目的
数据挖掘可通过识别在接种一种疫苗后比接种其他疫苗更常报告的事件,来加强对疫苗不良事件的传统监测。数据挖掘方法将某种情况或一组情况在接种疫苗后不久被报告的次数比例作为信号;因此,它是一种跨疫苗比较的相对比例,而非该情况的绝对发生率。疫苗不良事件报告系统(VAERS)包含约15万份可能与疫苗接种相关的不良事件报告。
方法
我们研究了四种数据挖掘技术:经验贝叶斯几何均值(EBGM)、EBGM的90%置信区间下限(EB05)、比例报告比值(PRR)和筛选后的PRR(SPRR)。我们将这些技术应用于VAERS数据库,并比较了各方法之间的一致性及其他性能特性,尤其关注各方法中数值得分最高的疫苗 - 事件组合。
结果
各方法中数值得分最高的疫苗 - 事件组合差异很大。并非所有代表已知关联的组合在所有方法中都出现在前100对疫苗 - 事件组合中。
结论
这四种方法在疫苗 - COSTART对的排名上存在差异。某一给定方法在某些情况下可能更优,但在其他情况下则较差。本文研究了这四种估计方法之间的统计关系。要确定哪种方法对公共卫生最适用,需要进行额外分析,重点关注使用已知疫苗 - 事件关联时的真阳性率和假阳性率。评估这些数据挖掘方法的特性将有助于确定此类方法在疫苗安全监测中的价值。