School of Chemistry, Food and Pharmacy, University of Reading, Reading RG6 6AD, U.K.
J Chem Inf Model. 2024 Nov 25;64(22):8464-8480. doi: 10.1021/acs.jcim.4c01309. Epub 2024 Nov 7.
Attention-based decoder models were used to generate libraries of novel inhibitors for the HMG-Coenzyme A reductase (HMGCR) enzyme. These deep neural network models were pretrained on previously synthesized drug-like molecules from the ZINC15 database to learn the syntax of SMILES strings and then fine-tuned with a set of ∼1000 molecules that inhibit HMGCR. The number of layers used for pretraining and fine-tuning was varied to find the optimal balance for robust library generation. Virtual screening libraries were also generated with different temperatures and numbers of input tokens (prompt length) to find the most desirable molecular properties. The resulting libraries were screened against several criteria, including IC50 values predicted by a dense neural network (DNN) trained on experimental HMGCR IC50 values, docking scores from AutoDock Vina (via Dockstring), a calculated quantitative estimate of druglikeness, and Tanimoto similarity to known HMGCR inhibitors. It was found that 50/50 or 25/75% pretrained/fine-tuned models with a nonzero temperature and shorter prompt lengths produced the most robust libraries, and the DNN-predicted IC50 values had good correlation with docking scores and statin similarity. 42% of generated molecules were classified as statin-like by k-means clustering, with the rosuvastatin-like group having the lowest IC50 values and lowest docking scores.
基于注意力的解码器模型被用于生成 HMG-Coenzyme A 还原酶(HMGCR)酶的新型抑制剂文库。这些深度神经网络模型在先前从 ZINC15 数据库中合成的类似药物的分子上进行了预训练,以学习 SMILES 字符串的语法,然后使用一组约 1000 个抑制 HMGCR 的分子进行微调。用于预训练和微调的层数有所不同,以找到稳健的文库生成的最佳平衡点。还使用不同的温度和输入标记数量(提示长度)生成虚拟筛选文库,以找到最理想的分子性质。根据几个标准对生成的文库进行筛选,包括基于实验 HMGCR IC50 值训练的密集神经网络(DNN)预测的 IC50 值、AutoDock Vina 的对接分数(通过 Dockstring)、计算得出的药物相似性定量估计以及与已知 HMGCR 抑制剂的 Tanimoto 相似性。结果发现,50/50 或 25/75%的预训练/微调模型,温度不为零且提示长度较短,产生的文库最稳健,DNN 预测的 IC50 值与对接分数和他汀类药物相似性具有良好的相关性。通过 K 均值聚类,有 42%的生成分子被归类为他汀类药物,其中瑞舒伐他汀类药物的 IC50 值最低,对接分数也最低。