Chaurasia Anushka, Kumar Deepak
Department of Computer Science and Engineering, National Institute of Technology Meghalaya, India.
Department of Computer Engineering, National Institute of Technology Kurukshetra, India.
J Biomed Inform. 2025 Jun;166:104832. doi: 10.1016/j.jbi.2025.104832. Epub 2025 Apr 28.
Adverse Drug Reactions (ADRs) during pregnancy pose significant risks to both the mother and the fetus. Conventional approaches to predict ADR are inadequate due to ethical restrictions that prevent performing medication studies in pregnant women, leading to restricted data samples. Hence, computational techniques have been promising for ADR predictions. However, most of these techniques have focused on the general population and face the challenge of class imbalance and lack of model interpretability. In the present work, an ensemble learning-based PregAN-NET framework has been proposed that addresses the issue of class imbalance by generating synthetic data employing Conditional Tabular Generative Adversarial Network (CTGAN) and integrates neural network and gradient boosting as a Boosted Neural Ensemble (BNE) architecture to predict safe and unsafe drugs considering their adverse reactions during pregnancy. Furthermore, the SHAP method has been employed to enhance the post-hoc interpretability of the BNE architecture by analyzing the contribution of different features towards prediction. The proposed framework has been applied to chemical and biological properties from PubChem and DrugBank, along with class labels from the ADReCS database. CTGAN has been evaluated for data balancing, showing a 2% to 5% performance improvement over SMOTE. The BNE architecture has outperformed six state-of-the-art methods by achieving mean ROC-AUC scores between 77.00% and 90.00% for chemical data, 66.00% and 74.00% for biological data, and 70.00% to 75.00% for combined datasets. Further, the top 20 contributory features in prediction corresponding to the different drug properties have been identified.
孕期药物不良反应(ADR)对母亲和胎儿均构成重大风险。由于伦理限制,无法在孕妇身上进行药物研究,导致数据样本受限,传统的ADR预测方法并不充分。因此,计算技术在ADR预测方面颇具前景。然而,这些技术大多专注于普通人群,面临类别不平衡和缺乏模型可解释性的挑战。在本研究中,提出了一种基于集成学习的PregAN-NET框架,该框架通过使用条件表格生成对抗网络(CTGAN)生成合成数据来解决类别不平衡问题,并将神经网络和梯度提升集成到一个增强神经集成(BNE)架构中,以根据药物在孕期的不良反应预测其安全性。此外,还采用了SHAP方法,通过分析不同特征对预测的贡献来增强BNE架构的事后可解释性。所提出的框架已应用于来自PubChem和DrugBank的化学和生物学特性,以及来自ADReCS数据库的类别标签。对CTGAN进行了数据平衡评估,结果表明其性能比SMOTE提高了2%至5%。BNE架构在化学数据的平均ROC-AUC得分在77.00%至90.00%之间、生物学数据的平均ROC-AUC得分在66.00%至74.00%之间、组合数据集的平均ROC-AUC得分在70.00%至75.00%之间,优于六种最先进的方法。此外,还确定了与不同药物特性相对应的预测中前20个贡献特征。