Darla Moore School of Business, University of South Carolina, Columbia, SC, 29208, USA.
Robert B. Willumstad School of Business, Adelphi University, Garden City, NY, 11530, USA.
Sci Rep. 2024 Oct 9;14(1):23636. doi: 10.1038/s41598-024-74505-2.
The presence of adverse drug reactions (ADRs) is an ongoing public health concern. While traditional methods to discover ADRs are very costly and limited, it is prudent to predict ADRs through non-invasive methods such as machine learning based on existing data. Although various studies exist regarding ADR prediction using non-clinical data, a process that leverages both demographic and non-clinical data for ADR prediction is missing. In addition, the importance of individual features in ADR prediction has yet to be fully explored. This study aims to develop an ADR prediction model based on demographic and non-clinical data, where we identify the highest contributing factors. We focus our efforts on 30 common and severe ADRs reported to the Food and Drug Administration (FDA) between 2012 and 2023. We have developed a random forest (RF) and deep learning (DL) machine learning model that ingests demographic data (e.g., Age and Gender of patients) and non-clinical data, which includes chemical, molecular, and biological drug characteristics. We successfully unified both demographic and non-clinical data sources within a complete dataset regarding ADR prediction. Model performances were assessed via the area under the receiver operating characteristic curve (AUC) and the mean average precision (MAP). We demonstrated that our parsimonious models, which include only the top 20 most important features comprising 5 demographic features and 15 non-clinical features (13 molecular and 2 biological), achieve ADR prediction performance comparable to a less practical, feature-rich model consisting of all 2,315 features. Specifically, our models achieved an AUC of 0.611 and 0.674 for RF and DL algorithms, respectively. We hope our research provides researchers and clinicians with valuable insights and facilitates future research designs by identifying top ADR predictors (including demographic information) and practical parsimonious models.
药物不良反应(ADR)的存在是一个持续存在的公共卫生问题。虽然发现 ADR 的传统方法非常昂贵且有限,但通过基于现有数据的机器学习等非侵入性方法来预测 ADR 是明智的。尽管已经有许多关于使用非临床数据预测 ADR 的研究,但缺乏利用人口统计学和非临床数据进行 ADR 预测的过程。此外,个体特征在 ADR 预测中的重要性尚未得到充分探索。本研究旨在开发一种基于人口统计学和非临床数据的 ADR 预测模型,确定最高贡献因素。我们的研究重点是 2012 年至 2023 年向美国食品和药物管理局(FDA)报告的 30 种常见且严重的 ADR。我们开发了一个随机森林(RF)和深度学习(DL)机器学习模型,该模型摄取人口统计学数据(例如患者的年龄和性别)和非临床数据,其中包括化学、分子和生物药物特征。我们成功地在一个完整的 ADR 预测数据集内统一了人口统计学和非临床数据源。通过接收者操作特征曲线下的面积(AUC)和平均精度(MAP)评估模型性能。我们证明,我们的简约模型(仅包含包括 5 个人口统计学特征和 15 个非临床特征(13 个分子和 2 个生物学)的前 20 个最重要特征),其预测 ADR 的性能可与包含所有 2315 个特征的实用性较差的丰富模型相媲美。具体而言,我们的 RF 和 DL 算法模型的 AUC 分别为 0.611 和 0.674。我们希望我们的研究为研究人员和临床医生提供有价值的见解,并通过识别顶级 ADR 预测因子(包括人口统计学信息)和实用的简约模型来促进未来的研究设计。