Hajim Wesam Ibrahim, Zainudin Suhaila, Daud Kauthar Mohd, Alheeti Khattab
Department of Applied Geology, College of Sciences, University of Tikrit, Tikrit, Salah ad Din, Iraq.
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia.
PeerJ Comput Sci. 2024 Dec 23;10:e2520. doi: 10.7717/peerj-cs.2520. eCollection 2024.
Advanced machine learning (ML) and deep learning (DL) methods have recently been utilized in Drug Response Prediction (DRP), and these models use the details from genomic profiles, such as extensive drug screening data and cell line data, to predict the response of drugs. Comparatively, the DL-based prediction approaches provided better learning of such features. However, prior knowledge, like pathway data, is sometimes discarded as irrelevant since the drug response datasets are multidimensional and noisy. Optimized feature learning and extraction processes are suggested to handle this problem. First, the noise and class imbalance problems must be tackled to avoid low identification accuracy, long prediction times, and poor applicability. This article aims to apply the Non-Negativity-Constrained Auto Encoder (NNCAE) network to tackle these issues, enhance the adaptive search for the optimal size of sliding windows, and ensure that deep network architectures are adept at learning the vital hidden features. NNCAE methodology is used after performing the standard pre-processing procedures to handle the noise and class imbalance problem. This class balanced and noise-removed input data features are learned to train the proposed hybrid classifier. The classification model, Golden Eagle Optimization-based Convolutional Long Short-Term Memory neural networks (GEO-Conv-LSTM), is assembled by integrating Convolutional Neural Network CNN and LSTM models, with parameter tuning performed by the GEO algorithm. Evaluations are conducted on two large datasets from the Genomics of Drug Sensitivity in Cancer (GDSC) repository, and the proposed NNCAE-GEO-Conv-LSTM-based approach has achieved 96.99% and 97.79% accuracies, respectively, with reduced processing time and error rate for the DRP problem.
先进的机器学习(ML)和深度学习(DL)方法最近已被用于药物反应预测(DRP),这些模型利用基因组图谱的细节,如广泛的药物筛选数据和细胞系数据,来预测药物反应。相比之下,基于DL的预测方法能更好地学习此类特征。然而,由于药物反应数据集是多维度且有噪声的,像通路数据这样的先验知识有时会被当作无关信息而被舍弃。建议采用优化的特征学习和提取过程来解决这个问题。首先,必须解决噪声和类不平衡问题,以避免识别准确率低、预测时间长和适用性差。本文旨在应用非负约束自动编码器(NNCAE)网络来解决这些问题,增强对滑动窗口最佳大小的自适应搜索,并确保深度网络架构擅长学习重要的隐藏特征。在执行标准预处理程序以处理噪声和类不平衡问题后,使用NNCAE方法。对这种经过类平衡和去噪的输入数据特征进行学习,以训练所提出的混合分类器。分类模型,即基于金鹰优化的卷积长短期记忆神经网络(GEO-Conv-LSTM),是通过整合卷积神经网络(CNN)和LSTM模型组装而成的,其参数调整由GEO算法执行。在来自癌症药物敏感性基因组学(GDSC)存储库的两个大型数据集上进行了评估,所提出的基于NNCAE-GEO-Conv-LSTM的方法分别实现了96.99%和97.79%的准确率,同时减少了DRP问题的处理时间和错误率。