Life Science School, Beijing University of Chinese Medicine, Beijing 100029, China.
Chinese Medicine School, Beijing University of Chinese Medicine, Beijing 100029, China.
J Chem Inf Model. 2020 Oct 26;60(10):4603-4613. doi: 10.1021/acs.jcim.0c00568. Epub 2020 Sep 1.
Oral bioavailability (OBA)-related pharmacokinetic properties, such as aqueous solubility, lipophilicity, and intestinal membrane permeability, play a significant role in drug discovery. However, their measurement is usually costly and time-consuming. Therefore, prediction models based on diverse approaches have been established in recent decades. Computational prediction of molecular properties has become an important step in drug discovery, aiming to identify potential drug-like candidates and reduce costs. However, limitations related to dataset capacity and algorithm adaptation still place restrictions on the applicability of the related models. In this study, we considered both dataset and algorithm optimization to address the challenge of predicting OBA-related molecular properties. Benchmark datasets of aqueous solubility (log ), lipophilicity (log ), and membrane permeability measured using the Caco-2 cell line (log ) were constructed by merging and calibrating experimental data from diverse articles and databases. Then, a novel molecular property prediction model, called a multiembedding-based synthetic network (MESN), was generated by applying a deep learning algorithm based on the synthesis of multiple types of molecular embeddings. MESN achieves performance improvements over other state-of-the-art methods for the prediction of aqueous solubility, lipophilicity, and membrane permeability. Results were also obtained using several other algorithms and independent validation datasets as a control study. Moreover, a dimension reduction analysis (based on t-distributed stochastic neighbor embedding, t-SNE) and an atomic feature similarity analysis showed that the molecular embeddings extracted from the MESN model exhibit good clustering and diversity. Overall, considering the fundamental role of the data and the superior prediction performance of the model, we highlight the applicability of MESN on benchmark datasets for further utility in drug discovery-related molecular property prediction.
口服生物利用度(OBA)相关的药代动力学性质,如水溶性、亲脂性和肠膜通透性,在药物发现中起着重要作用。然而,它们的测量通常是昂贵和耗时的。因此,近几十年来已经建立了基于各种方法的预测模型。计算预测分子性质已成为药物发现的重要步骤,旨在识别潜在的类药性候选物并降低成本。然而,与数据集容量和算法适应性相关的限制仍然限制了相关模型的适用性。在这项研究中,我们考虑了数据集和算法优化,以解决预测 OBA 相关分子性质的挑战。通过合并和校准来自不同文章和数据库的实验数据,构建了水溶性(log )、亲脂性(log )和 Caco-2 细胞系测量的膜通透性(log )的基准数据集。然后,通过应用基于多种分子嵌入合成的深度学习算法,生成了一种称为多嵌入合成网络(MESN)的新型分子性质预测模型。MESN 在预测水溶性、亲脂性和膜通透性方面优于其他最先进的方法。我们还使用其他几种算法和独立验证数据集作为对照研究获得了结果。此外,维度降低分析(基于 t 分布随机邻域嵌入,t-SNE)和原子特征相似性分析表明,从 MESN 模型中提取的分子嵌入表现出良好的聚类和多样性。总的来说,考虑到数据的基础性作用和模型的卓越预测性能,我们突出了 MESN 在基准数据集上的适用性,以进一步用于药物发现相关分子性质预测。