Jagiellonian University, Faculty of Physics, Astronomy and Applied Computer Science, S. Łojasiewicza Street 11, 30-348 Kraków, Poland.
Int J Mol Sci. 2019 May 2;20(9):2175. doi: 10.3390/ijms20092175.
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews' Correlation Coefficient and Cohen's kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
生物活性化合物可能为多种疾病提供治疗方法。同时,应用于药物发现的机器学习技术比湿实验室实验更便宜、更快,有能力更有效地识别具有预期药理活性的分子。因此,开发更具代表性的描述符和可靠的分类方法来准确预测分子活性是紧迫和必要的。在本文中,我们研究了一种基于球面谐波的新型表示形式,将其输入概率分类向量机分类器(即 SHPCVM),以化合物活性预测任务。我们利用表示学习尽可能精确地获取描述分子的特征。为了验证 SHPCVM 的性能,我们在二十一个 G 蛋白偶联受体(GPCR)上进行了十折交叉验证测试。通过分类准确性、精度、召回率、马修斯相关系数和科恩的 kappa 评估的实验结果(准确性为 0.86)表明,使用我们的基于球面谐波的表示形式和概率分类向量机相对较短,可以为 GPCR 获得非常令人满意的性能结果。