Khemani Dr Bharti, Malave Dr Sachin, Shinde Samyukta, Shukla Mandvi, Shikalgar Razzaq, Talwar Harshita
Assistant Professor, A. P. SHAH Institute of Technology, Survey No 12, 13, Opp. Hypercity Mall, Kasarvadavali, Ghodbunder Road, Thane West, Thane, Maharashtra 400615, India.
Head of Computer Engineering Department, A. P. SHAH Institute of Technology, Survey No 12, 13, Opp. Hypercity Mall, Kasarvadavali, Ghodbunder Road, Thane West, Thane, Maharashtra 400615, India.
MethodsX. 2025 Jun 23;15:103460. doi: 10.1016/j.mex.2025.103460. eCollection 2025 Dec.
In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the challenge of accurately detecting Serious Adverse Events (SAEs) in pharmacovigilance, a critical component in ensuring drug safety during and after clinical trials. The key problem lies in the underreporting and delayed detection of Adverse Drug Reactions (ADRs) due to the heterogeneous nature of medical data, class imbalance, and the limited scope of traditional monitoring techniques. This study proposes a hybrid AI-driven framework that integrates structured (e.g., patient demographics, lab results) and unstructured data (e.g., clinical notes) to detect ADRs using advanced deep learning and NLP methods. The objective is to outperform traditional signal detection methods and provide interpretable predictions to aid clinicians in real-time. By leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques, including Random Forests, Gradient Boosting Machines, and Convolutional Neural Networks (CNNs), our model aims to identify potential ADRs across different patient subgroups. Through meticulous feature engineering and the application of techniques to address data imbalance, our model demonstrates improved accuracy and interpretability in predicting ADRs. The CNN model achieved an accuracy of 85 %, outperforming traditional models, such as Logistic Regression (78 %) and Support Vector Machines (80 %). These findings suggest that specific demographic and clinical factors significantly influence the likelihood of adverse reactions, offering valuable insights for targeted monitoring and risk mitigation strategies[11]. This research underscores the potential of predictive modeling to enhance pharmacovigilance efforts and ensure safer clinical trial outcomes.•The research methodology includes a comparison of supervised learning algorithms, such as Logistic Regression, Random Forest, Gradient Boost, CNN, and genetic algorithms, to identify patterns and anomalies in clinical trial data. BERT and GPT, were also employed to provide the functionality of textual interactions over medical data.•Performance metrics such as accuracy, precision, recall, and F1-score were systematically applied to evaluate each model's performance. Among the models tested, the CNN model with BERT achieved the highest accuracy, providing valuable insights into the potential of deep learning for enhancing pharmacovigilance practices.•These findings suggest that an inclusion of diverse clinical data when supplied to advanced ML and NLP techniques can significantly improve the detection of ADRs, leading to better alignment with the fundamental principles of Good Clinical Practice (GCP).
在医疗保健行业,临床试验数据量的不断增加给确保药物安全和检测药物不良反应(ADR)带来了挑战。本研究旨在应对药物警戒中准确检测严重不良事件(SAE)的挑战,这是确保临床试验期间及之后药物安全的关键组成部分。关键问题在于,由于医疗数据的异质性、类别不平衡以及传统监测技术的有限范围,药物不良反应(ADR)报告不足且检测延迟。本研究提出了一个混合人工智能驱动的框架,该框架整合结构化数据(如患者人口统计学数据、实验室检查结果)和非结构化数据(如临床记录),使用先进的深度学习和自然语言处理方法来检测药物不良反应。目标是超越传统的信号检测方法,并提供可解释的预测结果,以帮助临床医生进行实时决策。通过利用先进的机器学习(ML)和深度学习(DL)技术,包括随机森林、梯度提升机和卷积神经网络(CNN),我们的模型旨在识别不同患者亚组中的潜在药物不良反应。通过精心的特征工程和应用解决数据不平衡的技术,我们的模型在预测药物不良反应方面表现出更高的准确性和可解释性。CNN模型的准确率达到了85%,优于传统模型,如逻辑回归(78%)和支持向量机(80%)。这些发现表明,特定的人口统计学和临床因素会显著影响不良反应的发生可能性,为有针对性的监测和风险缓解策略提供了有价值的见解[11]。本研究强调了预测建模在加强药物警戒工作和确保更安全的临床试验结果方面的潜力。
•研究方法包括对逻辑回归、随机森林、梯度提升、CNN和遗传算法等监督学习算法进行比较,以识别临床试验数据中的模式和异常情况。还采用了BERT和GPT来提供对医疗数据进行文本交互的功能。
•系统地应用了准确率、精确率、召回率和F1分数等性能指标来评估每个模型的性能。在测试的模型中,带有BERT的CNN模型准确率最高,为深度学习在加强药物警戒实践中的潜力提供了有价值的见解。
•这些发现表明,将多样化的临床数据提供给先进的ML和NLP技术时,能够显著提高药物不良反应的检测率,从而更好地符合良好临床实践(GCP)的基本原则。