Baruah Orchid, Parasar Upashya, Borphukan Anirban, Phukan Bikram, Bharali Pankaj, Nagamani Selvaraman, Mahanta Hridoy Jyoti
Department of Information Technology, The Assam Kaziranga University, Jorhat, Assam 785006, India.
Advanced Computation and Data Sciences Division, CSIR North East Institute of Science and Technology, Jorhat, Assam 785006, India.
Comput Biol Chem. 2024 Dec;113:108270. doi: 10.1016/j.compbiolchem.2024.108270. Epub 2024 Oct 28.
The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (https://github.com/hridoy69/HIA).
口服途径是药物递送最优选的途径,因此口服药物在制药市场中占最大份额。人体肠道吸收(HIA)与口服生物利用度密切相关,使其成为预测药物吸收的重要因素。在本研究中,我们专注于预测HIA时的药物渗透性,将其作为口服生物利用度的一个指标。从一些早期以及近期的研究中收集了一组2648种化合物,并进行整理以构建一个可靠的数据集。使用经过严格特征工程后选择的这些化合物的一组分子描述符对五种机器学习(ML)算法进行了训练。此外,使用同一组化合物开发了两种深度学习模型——基于图卷积神经网络(GCNN)和图注意力网络(GAT)的模型,以利用自动提取的特征进行预测。数值分析表明,在五个ML模型中,随机森林和LightGBM在测试集上的预测准确率分别为87.71%和86.04%,在外部验证集上的预测准确率分别为81.43%和77.30%。而基于GCNN和GAT的模型,在测试集上最终实现的准确率分别为77.69%和78.58%,在外部验证集上分别为79.29%和79.42%。我们相信将这些模型用于口服药物筛选可以提供有前景的结果,因此将数据集和模型存放在GitHub平台(https://github.com/hridoy69/HIA)上。