Kontsioti Elpida, Maskell Simon, Pirmohamed Munir
Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, UK.
The Wolfson Center for Personalized Medicine, Center for Drug Safety Science, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
Pharmacoepidemiol Drug Saf. 2023 Aug;32(8):832-844. doi: 10.1002/pds.5609. Epub 2023 Mar 26.
To evaluate the impact of multiple design criteria for reference sets that are used to quantitatively assess the performance of pharmacovigilance signal detection algorithms (SDAs) for drug-drug interactions (DDIs).
Starting from a large and diversified reference set for two-way DDIs, we generated custom-made reference sets of various sizes considering multiple design criteria (e.g., adverse event background prevalence). We assessed differences observed in the performance metrics of three SDAs when applied to FDA Adverse Event Reporting System (FAERS) data.
For some design criteria, the impact on the performance metrics was neglectable for the different SDAs (e.g., theoretical evidence associated with positive controls), while others (e.g., restriction to designated medical events, event background prevalence) seemed to have opposing and effects of different sizes on the Area Under the Curve (AUC) and positive predictive value (PPV) estimates.
The relative composition of reference sets can significantly impact the evaluation metrics, potentially altering the conclusions regarding which methodologies are perceived to perform best. We therefore need to carefully consider the selection of controls to avoid misinterpretation of signals triggered by confounding factors rather than true associations as well as adding biases to our evaluation by "favoring" some algorithms while penalizing others.
评估用于定量评估药物相互作用(DDIs)的药物警戒信号检测算法(SDAs)性能的多个参考集设计标准的影响。
从一个针对双向DDIs的大型多样化参考集开始,我们考虑多个设计标准(例如不良事件背景患病率)生成了各种大小的定制参考集。我们评估了三种SDA应用于美国食品药品监督管理局不良事件报告系统(FAERS)数据时在性能指标上观察到的差异。
对于某些设计标准,不同SDA对性能指标的影响可忽略不计(例如与阳性对照相关的理论证据),而其他标准(例如对指定医疗事件的限制、事件背景患病率)似乎对曲线下面积(AUC)和阳性预测值(PPV)估计有相反且大小不同的影响。
参考集的相对组成会显著影响评估指标,可能改变关于哪种方法被认为表现最佳的结论。因此,我们需要仔细考虑对照的选择,以避免将由混杂因素而非真实关联触发的信号误判,以及通过“偏袒”某些算法而惩罚其他算法给我们的评估增加偏差。