The Cloud Computing Engineering Research Center of Yunnan Province, Key Laboratory of Software Engineering of Yunnan Province, Key Laboratory of Medicinal Chemistry for Natural Resource of Ministry of Education and Yunnan Characteristic Plant Extraction Laboratory, School of Software, School of Pharmacy, Yunnan University, Kunming 650091, China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad115.
Histones are the chief protein components of chromatin, and the chemical modifications on histones crucially influence the transcriptional state of related genes. Histone modifying enzyme (HME), responsible for adding or removing the chemical labels, has emerged as a very important class of drug target, with a few HME inhibitors launched as anti-cancerous drugs and tens of molecules under clinical trials. To accelerate the drug discovery process of HME inhibitors, machine learning-based predictive models have been developed to enrich the active molecules from vast chemical space. However, the number of compounds with known activity distributed largely unbalanced among different HMEs, particularly with many targets of less than a hundred active samples. In this case, it is difficult to build effective virtual screening models directly based on machine learning.
To this end, we propose a new Meta-learning-based Histone Modifying Enzymes Inhibitor prediction method (MetaHMEI). Our proposed MetaHMEI first uses a self-supervised pre-training approach to obtain high-quality molecular substructure embeddings from a large unlabeled chemical dataset. Then, MetaHMEI exploits a Transformer-based encoder and meta-learning framework to build a prediction model. MetaHMEI allows the effective transfer of the prior knowledge learned from HMEs with sufficient samples to HMEs with a small number of samples, so the proposed model can produce accurate predictions for HMEs with limited data. Extensive experimental results on our collected and curated HMEs datasets show that MetaHMEI is better than other methods in the case of few-shot learning. Furthermore, we applied MetaHMEI in the virtual screening process of histone JMJD3 inhibitors and successfully obtained three small molecule inhibitors, further supporting the validity of our model.
组蛋白是染色质的主要蛋白质成分,组蛋白上的化学修饰极大地影响相关基因的转录状态。负责添加或去除化学标签的组蛋白修饰酶(HME)已成为一类非常重要的药物靶点,已有几种 HME 抑制剂作为抗癌药物上市,数十种分子正在临床试验中。为了加速 HME 抑制剂的药物发现过程,已经开发了基于机器学习的预测模型,从广阔的化学空间中富集活性分子。然而,具有已知活性的化合物数量在不同的 HME 之间分布极不平衡,特别是许多靶标只有不到一百个活性样本。在这种情况下,很难直接基于机器学习构建有效的虚拟筛选模型。
为此,我们提出了一种新的基于元学习的组蛋白修饰酶抑制剂预测方法(MetaHMEI)。我们提出的 MetaHMEI 首先使用自监督预训练方法从大型未标记化学数据集获得高质量的分子子结构嵌入。然后,MetaHMEI 利用基于 Transformer 的编码器和元学习框架构建预测模型。MetaHMEI 允许从具有足够样本的 HME 有效转移已学习的先验知识到具有少量样本的 HME,因此,所提出的模型可以为数据有限的 HME 产生准确的预测。在我们收集和整理的 HME 数据集上的广泛实验结果表明,在少样本学习的情况下,MetaHMEI 优于其他方法。此外,我们将 MetaHMEI 应用于组蛋白 JMJD3 抑制剂的虚拟筛选过程中,成功获得了三种小分子抑制剂,进一步支持了我们模型的有效性。