, Cambridge, MA, USA.
Biogen, Safety and Benefit Risk Management, Research Triangle Park, NC, USA.
Pharmaceut Med. 2024 Jul;38(4):321-329. doi: 10.1007/s40290-024-00530-1. Epub 2024 Jul 13.
Several quantitative methods have been established, in pharmacovigilance, to detect signals of disproportionate reporting (SDRs) from databases containing reports of adverse drug reactions (ADRs). The signal detection algorithms (SDAs) and the source of the reporting per product vary, but it is unclear whether any algorithm can provide satisfactory performance using data with such large variance factors.
Determine the appropriate SDA for Biogen's internal Global Safety Database (GSD) given the characteristics of the database including frequencies of events, data skewness, outliers, and missing information. Compare performance of standard approaches (EBGM, EB05, PRR, and ROR), well accepted by industry, to a Biogen-developed Machine Learning (ML) Regression Decision Tree (RDT) model, across several Biogen products, to determine a champion SDA.
All data associated with seven marketed Biogen products were chosen and a historical subset of reported ADRs were considered. Six SDAs (five common industry disproportionality methods) and RDT were evaluated. The SDRs were calculated on training and test data composed of quarterly reporting intervals from 2004-2019. The performance measures used were sensitivity, precision, time to detect new events, and frequency of detected cases for each algorithm for each product. Outcomes in the test data are known a priori and easily compared to predicted outcomes. Validation was performed via rates of misclassification. This work solely represents Biogen's internal information, intentionally chosen to serve the performance review of its signal detection systems, and results will not necessarily be generalizable to other external sources.
Several algorithms performed differently among products, but no one method dominated any other. Performance was dependent on the thresholds used to define a signal according to different criteria. However, those different statistics subtly influenced the achievable performance. The relative performance of RDT and Medicines and Healthcare products Regulatory Agency (MHRA) algorithms were superior and paired across products. A reduction in precision for all methods spanning the products was present. Hence, companies evaluating signal detection approaches, search for innovative methods to minimize this effect.
In designing signal detection systems, careful consideration should be given to the criteria that are used to define SDRs. The choice of disproportionality statistics does not affect the achievable range of signal detection performance. These choices should consider mainly ease of implementation and interpretation. The implementation of a method is specific to its accuracy. The RDT attempted to take advantage of known methods and compare results on a per-product basis. Many factors influencing ADRs may improve RDT in future efforts. In this experiment, RDT demonstrated superiority in terms of quickest time to detect and capturing of the highest number of ADRs. Next steps include expansion of data for products representing other indications and testing models in external databases to investigate generalizability of estimates when comparing SDAs.
在药物警戒中,已经建立了几种定量方法来从包含药物不良反应报告的数据库中检测到不成比例报告的信号(SDR)。信号检测算法(SDA)和每种产品的报告来源不同,但尚不清楚任何算法是否可以使用具有如此大方差因素的数据提供令人满意的性能。
鉴于数据库的特征,包括事件频率、数据偏度、异常值和缺失信息,确定适用于 Biogen 内部全球安全数据库(GSD)的适当 SDA。比较行业广泛接受的标准方法(EBGM、EB05、PRR 和 ROR)与 Biogen 开发的机器学习(ML)回归决策树(RDT)模型在多个 Biogen 产品中的性能,以确定一个合适的 SDA。
选择了与七种上市 Biogen 产品相关的所有数据,并考虑了报告的不良反应的历史子集。评估了六种 SDA(五种常见的行业不成比例性方法)和 RDT。SDR 是根据 2004 年至 2019 年的每季度报告间隔计算的。使用的性能指标是每种产品每种算法的敏感性、精度、检测新事件的时间和检测到的病例的频率。测试数据中的结果是事先已知的,并且很容易与预测结果进行比较。通过错误分类率进行验证。这项工作仅代表 Biogen 的内部信息,旨在服务于其信号检测系统的性能审查,并且结果不一定适用于其他外部来源。
几种算法在产品之间的表现不同,但没有一种方法优于其他方法。性能取决于根据不同标准定义信号所使用的阈值。然而,这些不同的统计数据微妙地影响了可实现的性能。RDT 和药品和保健品管理局(MHRA)算法的相对性能较好,并且在产品之间配对。所有方法在产品之间的精度都降低了。因此,评估信号检测方法的公司正在寻找创新方法来最小化这种影响。
在设计信号检测系统时,应仔细考虑用于定义 SDR 的标准。不成比例性统计数据的选择不会影响信号检测性能的可实现范围。这些选择应主要考虑实施的容易程度和解释。方法的实施取决于其准确性。RDT 试图利用已知的方法并在每个产品的基础上比较结果。影响不良反应的许多因素可能会在未来的工作中改进 RDT。在这项实验中,RDT 在检测时间最快和捕获最高数量的不良反应方面表现出优势。下一步包括扩大代表其他适应症的产品的数据,并在外部数据库中测试模型,以研究在比较 SDA 时估计值的可推广性。