Hochberg Alan M, Hauben Manfred, Pearson Ronald K, O'Hara Donald J, Reisinger Stephanie J, Goldsmith David I, Gould A Lawrence, Madigan David
ProSanos Corporation, Harrisburg, Pennsylvania 17102, USA.
Drug Saf. 2009;32(6):509-25. doi: 10.2165/00002018-200932060-00007.
Pharmacovigilance data-mining algorithms (DMAs) are known to generate significant numbers of false-positive signals of disproportionate reporting (SDRs), using various standards to define the terms 'true positive' and 'false positive'.
To construct a highly inclusive reference event database of reported adverse events for a limited set of drugs, and to utilize that database to evaluate three DMAs for their overall yield of scientifically supported adverse drug effects, with an emphasis on ascertaining false-positive rates as defined by matching to the database, and to assess the overlap among SDRs detected by various DMAs.
A sample of 35 drugs approved by the US FDA between 2000 and 2004 was selected, including three drugs added to cover therapeutic categories not included in the original sample. We compiled a reference event database of adverse event information for these drugs from historical and current US prescribing information, from peer-reviewed literature covering 1999 through March 2006, from regulatory actions announced by the FDA and from adverse event listings in the British National Formulary. Every adverse event mentioned in these sources was entered into the database, even those with minimal evidence for causality. To provide some selectivity regarding causality, each entry was assigned a level of evidence based on the source of the information, using rules developed by the authors. Using the FDA adverse event reporting system data for 2002 through 2005, SDRs were identified for each drug using three DMAs: an urn-model based algorithm, the Gamma Poisson Shrinker (GPS) and proportional reporting ratio (PRR), using previously published signalling thresholds. The absolute number and fraction of SDRs matching the reference event database at each level of evidence was determined for each report source and the data-mining method. Overlap of the SDR lists among the various methods and report sources was tabulated as well.
The GPS algorithm had the lowest overall yield of SDRs (763), with the highest fraction of events matching the reference event database (89 SDRs, 11.7%), excluding events described in the prescribing information at the time of drug approval. The urn model yielded more SDRs (1562), with a non-significantly lower fraction matching (175 SDRs, 11.2%). PRR detected still more SDRs (3616), but with a lower fraction matching (296 SDRs, 8.2%). In terms of overlap of SDRs among algorithms, PRR uniquely detected the highest number of SDRs (2231, with 144, or 6.5%, matching), followed by the urn model (212, with 26, or 12.3%, matching) and then GPS (0 SDRs uniquely detected).
The three DMAs studied offer significantly different tradeoffs between the number of SDRs detected and the degree to which those SDRs are supported by external evidence. Those differences may reflect choices of detection thresholds as well as features of the algorithms themselves. For all three algorithms, there is a substantial fraction of SDRs for which no external supporting evidence can be found, even when a highly inclusive search for such evidence is conducted.
已知药物警戒数据挖掘算法(DMA)会使用各种标准来定义“真阳性”和“假阳性”,从而产生大量不成比例报告(SDR)的假阳性信号。
为一组有限的药物构建一个包含性强的已报告不良事件参考事件数据库,并利用该数据库评估三种DMA在科学支持的药物不良反应总体产出方面的情况,重点是根据与该数据库匹配情况确定假阳性率,并评估各种DMA检测到的SDR之间的重叠情况。
选取了2000年至2004年期间美国食品药品监督管理局(FDA)批准的35种药物样本,其中包括三种新增药物以涵盖原始样本中未包含的治疗类别。我们从美国历史和当前的处方信息、涵盖1999年至2006年3月的同行评审文献、FDA宣布的监管行动以及英国国家处方集的不良事件列表中,汇编了这些药物的不良事件信息参考事件数据库。这些来源中提及的每一个不良事件都被录入数据库,即使那些因果关系证据极少的事件。为了在因果关系方面提供一定的选择性,根据作者制定的规则,基于信息来源为每个条目分配一个证据级别。利用2002年至2005年的FDA不良事件报告系统数据,使用三种DMA为每种药物识别SDR:基于瓮模型的算法、伽马泊松收缩器(GPS)和比例报告比(PRR),使用先前公布的信号阈值。针对每个报告来源和数据挖掘方法,确定在每个证据级别上与参考事件数据库匹配的SDR的绝对数量和比例。还将各种方法和报告来源之间SDR列表的重叠情况制成表格。
GPS算法的SDR总体产出最低(763个),与参考事件数据库匹配的事件比例最高(89个SDR,11.7%),不包括药物批准时处方信息中描述的事件。瓮模型产生的SDR更多(1562个),匹配比例略低(175个SDR,11.2%)。PRR检测到的SDR更多(3616个),但匹配比例更低(296个SDR,8.2%)。就算法之间SDR的重叠而言,PRR独特检测到的SDR数量最多(2231个,其中144个,即6.5%,匹配),其次是瓮模型(212个,其中26个,即12.3%,匹配),然后是GPS(无独特检测到的SDR)。
所研究的三种DMA在检测到的SDR数量与这些SDR得到外部证据支持的程度之间提供了显著不同的权衡。这些差异可能反映了检测阈值的选择以及算法本身的特征。对于所有三种算法,即使进行了高度包容的此类证据搜索,仍有很大一部分SDR找不到外部支持证据。