Jamal Salma, Arora Sonam, Scaria Vinod
CSIR Open Source Drug Discovery Unit (CSIR-OSDD), Anusandhan Bhawan, Delhi, India.
Delhi Technological University, Delhi, India.
PLoS One. 2016 Sep 13;11(9):e0083032. doi: 10.1371/journal.pone.0083032. eCollection 2016.
The dynamic and differential regulation and expression of genes is majorly governed by the complex interactions of a subset of biomolecules in the cell operating at multiple levels starting from genome organisation to protein post-translational regulation. The regulatory layer contributed by the epigenetic layer has been one of the favourite areas of interest recently. This layer of regulation as we know today largely comprises of DNA modifications, histone modifications and noncoding RNA regulation and the interplay between each of these major components. Epigenetic regulation has been recently shown to be central to development of a number of disease processes. The availability of datasets of high-throughput screens for molecules for biological properties offer a new opportunity to develop computational methodologies which would enable in-silico screening of large molecular libraries.
In the present study, we have used data from high throughput screens for the inhibitors of epigenetic modifiers. Computational predictive models were constructed based on the molecular descriptors. Machine learning algorithms for supervised training, Naive Bayes and Random Forest, were used to generate predictive models for the small molecule inhibitors of histone methyl-transferases and demethylases. Random forest, with the accuracy of 80%, was identified as the most accurate classifier. Further we complemented the study with substructure search approach filtering out the probable pharmacophores from the active molecules leading to drug molecules.
We show that effective use of appropriate computational algorithms could be used to learn molecular and structural correlates of biological activities of small molecules. The computational models developed could be potentially used to screen and identify potential new biological activities of molecules from large molecular libraries and prioritise them for in-depth biological assays. To the best of our knowledge, this is the first and most comprehensive computational analysis towards understanding activities of small molecules inhibitors of epigenetic modifiers.
基因的动态差异调控与表达主要受细胞内一部分生物分子复杂相互作用的支配,这些相互作用在从基因组组织到蛋白质翻译后调控的多个层面上发挥作用。表观遗传层所贡献的调控层面最近一直是备受关注的领域之一。正如我们如今所知,这一调控层面主要包括DNA修饰、组蛋白修饰、非编码RNA调控以及这些主要成分之间的相互作用。最近研究表明,表观遗传调控是许多疾病进程发展的核心。用于生物特性分子的高通量筛选数据集的可用性为开发计算方法提供了新机会,这些方法能够对大型分子文库进行计算机模拟筛选。
在本研究中,我们使用了针对表观遗传修饰剂抑制剂的高通量筛选数据。基于分子描述符构建了计算预测模型。使用用于监督训练的机器学习算法——朴素贝叶斯和随机森林,来生成组蛋白甲基转移酶和去甲基酶小分子抑制剂的预测模型。随机森林的准确率为80%,被确定为最准确的分类器。此外,我们用子结构搜索方法对该研究进行补充,从活性分子中筛选出可能的药效基团,从而得到药物分子。
我们表明,有效使用适当的计算算法可用于了解小分子生物活性的分子和结构相关性。所开发的计算模型有可能用于从大型分子文库中筛选和识别分子的潜在新生物活性,并将它们列为深入生物学分析的优先对象。据我们所知,这是首次对理解表观遗传修饰剂小分子抑制剂活性进行的最全面的计算分析。