一种基于矩阵分解的用于药物不良反应预测的新型加权伪标签框架。

A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.

作者信息

Chen Junheng, Han Fangfang, He Mingxiu, Shi Yiyang, Cai Yongming

机构信息

School of Medical Information and Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.

NMPA Key Laboratory for Technology Research and Evaluation of Pharmacovigilance, Guangzhou, 510006, China.

出版信息

BMC Bioinformatics. 2025 Feb 17;26(1):54. doi: 10.1186/s12859-025-06053-z.

DOI:10.1186/s12859-025-06053-z

PMID:39962381

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11831795/

Abstract

Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.

摘要

药物不良反应（ADR）是严重危及人类生命并造成高额经济负担的全球公共卫生事件之一。因此，预测其发生的可能性并采取早期有效的应对措施具有重要意义。构建药物与其不良反应之间的相关矩阵，随后进行有效的相关数据挖掘，是利用可获取的公共数据预测ADR的当前策略之一。由于现实世界数据中已知ADR的数量远少于未知ADR的数量，药物 - ADR关联矩阵非常稀疏，这极大地影响了机器学习方法的分类性能。为了有效解决稀疏性问题，我们提出了一种新颖的加权伪标签框架，该框架通过整合多个加权矩阵分解（MF）模型来挖掘潜在的未知药物 - ADR对，并将它们视为伪标签药物 - ADR对。将伪标签数据添加到训练集中，并对MF模型进行微调以提高分类性能。为了防止过度拟合容易找到的伪标签并提高伪标签的质量，采用了一种新颖的伪标签加权方法。本文在相同实验条件下重现了基线，以评估所提出方法对来自副作用资源（SIDER）数据库的稀疏数据的性能。实验结果表明，我们的方法在精确召回率和F1分数的曲线下面积方面优于其他基线，并且在更稀疏的场景中仍保持最佳性能。此外，我们进行了案例研究，结果表明我们提出的框架在现实世界中有效地预测了ADR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1759/11831795/18e56be1a94a/12859_2025_6053_Fig1_HTML.jpg

相似文献

A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.

BMC Bioinformatics. 2025 Feb 17;26(1):54. doi: 10.1186/s12859-025-06053-z.

Ontology-based literature mining and class effect analysis of adverse drug reactions associated with neuropathy-inducing drugs.

J Biomed Semantics. 2018 Jun 7;9(1):17. doi: 10.1186/s13326-018-0185-x.

Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels.

BMC Bioinformatics. 2019 Dec 23;20(Suppl 21):707. doi: 10.1186/s12859-019-3195-5.

Mining Real-World Big Data to Characterize Adverse Drug Reaction Quantitatively: Mixed Methods Study.

J Med Internet Res. 2024 May 3;26:e48572. doi: 10.2196/48572.

Supervised signal detection for adverse drug reactions in medication dispensing data.

Comput Methods Programs Biomed. 2018 Jul;161:25-38. doi: 10.1016/j.cmpb.2018.03.021. Epub 2018 Apr 14.

Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs.

J Am Med Inform Assoc. 2012 Jun;19(e1):e28-35. doi: 10.1136/amiajnl-2011-000699.

Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases.

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):517. doi: 10.1186/s12859-018-2520-8.

Filtering big data from social media--Building an early warning system for adverse drug reactions.

J Biomed Inform. 2015 Apr;54:230-40. doi: 10.1016/j.jbi.2015.01.011. Epub 2015 Feb 14.

FaxMatch: Multi-Curriculum Pseudo-Labeling for semi-supervised medical image classification.

Med Phys. 2023 May;50(5):3210-3222. doi: 10.1002/mp.16312. Epub 2023 Feb 21.

ADR-DQPU: A Novel ADR Signal Detection Using Deep Reinforcement and Positive-Unlabeled Learning.

IEEE J Biomed Health Inform. 2025 Feb;29(2):831-839. doi: 10.1109/JBHI.2024.3492005. Epub 2025 Feb 10.

本文引用的文献

Ensemble Learning for Disease Prediction: A Review.

Healthcare (Basel). 2023 Jun 20;11(12):1808. doi: 10.3390/healthcare11121808.

Off-Target Effects of Cancer Therapy on Development of Therapy-Induced Arrhythmia: A Review.

Cardiology. 2023;148(4):324-334. doi: 10.1159/000529260. Epub 2023 Jan 26.

idse-HE: Hybrid embedding graph neural network for drug side effects prediction.

J Biomed Inform. 2022 Jul;131:104098. doi: 10.1016/j.jbi.2022.104098. Epub 2022 May 28.

DSGAT: predicting frequencies of drug side effects by graph attention networks.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab586.

DeepSide: A Deep Learning Approach for Drug Side Effect Prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):330-339. doi: 10.1109/TCBB.2022.3141103. Epub 2023 Feb 3.

Predicting the side effects of drugs using matrix factorization on spontaneous reporting database.

Sci Rep. 2021 Dec 14;11(1):23942. doi: 10.1038/s41598-021-03348-y.

Graph convolutional networks for computational drug development and discovery.

Brief Bioinform. 2020 May 21;21(3):919-935. doi: 10.1093/bib/bbz042.

Identification of Drug-Side Effect Association via Semisupervised Model and Multiple Kernel Learning.

IEEE J Biomed Health Inform. 2019 Nov;23(6):2619-2632. doi: 10.1109/JBHI.2018.2883834. Epub 2018 Nov 28.

Focal Loss for Dense Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction.

BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):212. doi: 10.1186/s12859-018-2192-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于矩阵分解的用于药物不良反应预测的新型加权伪标签框架。

A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献