Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Center for Advanced Therapeutics, Institute of Molecular Biosciences, Mahidol University, Nakhon Pathom 73170, Thailand.
Methods. 2024 Oct;230:147-157. doi: 10.1016/j.ymeth.2024.08.003. Epub 2024 Aug 25.
Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew's correlation coefficient of 0.850, which are 0.44-6.11% and 0.83-11.90% higher than its constituent baseline models, respectively.
表观遗传学涉及基因表达的可逆修饰,而不会改变遗传密码本身。在这些修饰中,组蛋白去乙酰化酶 (HDACs) 通过从组蛋白赖氨酸残基上去除乙酰基而起关键作用。HDACs 的过度表达与肿瘤细胞的增殖和存活有关。为了对抗这一点,通常在癌症治疗中使用 HDAC 抑制剂 (HDACi)。然而,pan-HDAC 抑制可能导致许多副作用。因此,同工型选择性 HDAC 抑制剂,如 HDAC3i,在最小化脱靶效应的同时,可能有利于治疗各种医疗状况。迄今为止,仅使用 SMILES 符号而没有任何实验证据的计算方法变得越来越流行,对于新型潜在治疗药物的初步发现是必要的。在这项研究中,我们开发了一种创新且高精度的堆叠集成框架,称为 Stack-HDAC3i,它可以仅使用 SMILES 符号直接识别 HDAC3i。使用最新的基准数据集,我们首先使用分子描述符和 Mol2Vec 嵌入生成特征表示,这些特征表示涵盖了嵌入在 HDAC3i 中的多视图信息,例如结构和上下文信息。随后,使用这 9 种流行的 ML 算法使用这些特征表示来训练基线模型。最后,从选定的基线模型中提取概率特征,融合构建最终的堆叠模型。交叉验证和独立测试均表明,Stack-HDAC3i 是一种高精度预测模型,具有很强的泛化能力,可用于识别 HDAC3i。此外,在独立测试中,Stack-HDAC3i 的准确率为 0.926,马修相关系数为 0.850,分别比其组成的基线模型高 0.44-6.11%和 0.83-11.90%。