School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China.
Jiangsu Center for ADR Monitoring, Nanjing, 210002, China.
BMC Med Inform Decis Mak. 2020 Feb 3;20(1):18. doi: 10.1186/s12911-020-1037-z.
Data masking is an inborn defect of measures of disproportionality in adverse drug reactions (ADRs) signal detection. Many previous studies can be roughly classified into three categories: data removal, regression and stratification. However, frequency differences of adverse drug events (ADEs) reports, which would be an important factor of masking, were not considered in these methods. The aim of this study is to explore a novel stratification method for minimizing the impact of frequency differences on real signals masking.
Reports in the Chinese Spontaneous Reporting Database (CSRD) between 2010 and 2011 were selected. The overall dataset was stratified into some clusters by the frequency of drugs, ADRs, and drug-event combinations (DECs) in sequence. K-means clustering was used to conduct stratification according to data distribution characteristics. The Information Component (IC) was adopted for signal detection in each cluster respectively. By extracting ADRs from drug product labeling, a reference database was introduced for performance evaluation based on Recall, Precision and F-measure. In addition, some DECs from the Adverse Drug Reactions Information Bulletin (ADRIB) issued by CFDA were collected for further reliability evaluation.
With stratification, the study dataset was divided into 21 clusters, among which the frequency of DRUGs, ADRs or DECs followed the similar order of magnitude respectively. Recall increased by 34.95% from 29.93 to 40.39%, Precision reduced by 10.52% from 54.56 to 48.82%, while F-measure increased by 14.39% from 38.65 to 44.21%. According to ADRIB after 2011, 5 DECs related to Potassium Magnesium Aspartate, 61 DECs related to Levofloxacin Hydrochloride and 26 DECs related to Cefazolin were highlighted.
The proposed method is effectively and reliably for the minimization of data masking effect in signal detection. Considering the decrease of Precision, it is suggested to be a supplement rather than an alternative to non-stratification method.
数据掩蔽是药物不良反应(ADR)信号检测中比例失调度量的固有缺陷。许多先前的研究大致可以分为三类:数据删除、回归和分层。然而,这些方法并未考虑到不良药物事件(ADE)报告的频率差异,这将是掩蔽的一个重要因素。本研究旨在探索一种新的分层方法,以最小化频率差异对真实信号掩蔽的影响。
选择了 2010 年至 2011 年中国自发报告数据库(CSRD)中的报告。该数据集首先根据药物、ADR 和药物事件组合(DEC)的频率顺序分层为一些簇。K-均值聚类根据数据分布特征进行分层。分别采用信息分量(IC)在每个簇中进行信号检测。通过从药品标签中提取 ADR,引入参考数据库,根据召回率、精度和 F 度量进行性能评估。此外,还收集了 CFDA 发布的《药品不良反应信息通报》(ADRIB)中的一些 DEC 进行进一步的可靠性评估。
通过分层,研究数据集被分为 21 个簇,其中 DRUGs、ADRs 或 DECs 的频率依次遵循相似的数量级。召回率从 29.93%提高到 40.39%,提高了 34.95%;精度从 54.56%降低到 48.82%,降低了 10.52%;F 度量从 38.65%提高到 44.21%,提高了 14.39%。根据 2011 年后的 ADRIB,突出了 5 个与天门冬氨酸钾镁相关的 DEC、61 个与盐酸左氧氟沙星相关的 DEC 和 26 个与头孢唑林相关的 DEC。
所提出的方法有效地、可靠地降低了信号检测中数据掩蔽的影响。考虑到精度的降低,建议将其作为非分层方法的补充,而不是替代方法。