Chen Junheng, Han Fangfang, He Mingxiu, Shi Yiyang, Cai Yongming
School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.
NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, China.
BMC Bioinformatics. 2025 Feb 17;26(1):54. doi: 10.1186/s12859-025-06053-z.
Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.
药物不良反应(ADR)是严重危及人类生命并造成高额经济负担的全球公共卫生事件之一。因此,预测其发生的可能性并采取早期有效的应对措施具有重要意义。构建药物与其不良反应之间的相关矩阵,随后进行有效的相关数据挖掘,是利用可获取的公共数据预测ADR的当前策略之一。由于现实世界数据中已知ADR的数量远少于未知ADR的数量,药物 - ADR关联矩阵非常稀疏,这极大地影响了机器学习方法的分类性能。为了有效解决稀疏性问题,我们提出了一种新颖的加权伪标签框架,该框架通过整合多个加权矩阵分解(MF)模型来挖掘潜在的未知药物 - ADR对,并将它们视为伪标签药物 - ADR对。将伪标签数据添加到训练集中,并对MF模型进行微调以提高分类性能。为了防止过度拟合容易找到的伪标签并提高伪标签的质量,采用了一种新颖的伪标签加权方法。本文在相同实验条件下重现了基线,以评估所提出方法对来自副作用资源(SIDER)数据库的稀疏数据的性能。实验结果表明,我们的方法在精确召回率和F1分数的曲线下面积方面优于其他基线,并且在更稀疏的场景中仍保持最佳性能。此外,我们进行了案例研究,结果表明我们提出的框架在现实世界中有效地预测了ADR。