Iqbal Javed, Vogt Martin, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany.
J Cheminform. 2020 May 18;12(1):34. doi: 10.1186/s13321-020-00436-5.
Activity landscapes (ALs) are graphical representations that combine compound similarity and activity data. ALs are constructed for visualizing local and global structure-activity relationships (SARs) contained in compound data sets. Three-dimensional (3D) ALs are reminiscent of geographical maps where differences in landscape topology mirror different SAR characteristics. 3D AL models can be stored as differently formatted images and are thus amenable to image analysis approaches, which have thus far not been considered in the context of graphical SAR analysis. In this proof-of-concept study, 3D ALs were constructed for a variety of compound activity classes and 3D AL image variants of varying topology and information content were generated and classified. To these ends, convolutional neural networks (CNNs) were initially applied to images of original 3D AL models with color-coding reflecting compound potency information that were taken from different viewpoints. Images of 3D AL models were transformed into variants from which one-dimensional features were extracted. Other machine learning approaches including support vector machine (SVM) and random forest (RF) algorithms were applied to derive models on the basis of such features. In addition, SVM and RF models were trained using other features obtained from images through edge filtering. Machine learning was able to accurately distinguish between 3D AL image variants with different topology and information content. Overall, CNNs which directly learned feature representations from 3D AL images achieved highest classification accuracy. Predictive performance for CNN, SVM, and RF models was highest for image variants emphasizing topological elevation. In addition, SVM models trained on rudimentary images from edge filtering classified such images with high accuracy, which further supported the critical role of altitude-dependent topological features for image analysis and predictions. Taken together, the findings of our proof-of-concept investigation indicate that image analysis has considerable potential for graphical SAR exploration to systematically infer different SAR characteristics from topological features of 3D ALs.
活性景观图(ALs)是结合化合物相似性和活性数据的图形表示。构建活性景观图是为了可视化化合物数据集中包含的局部和全局结构-活性关系(SARs)。三维(3D)活性景观图让人联想到地理地图,其中景观拓扑结构的差异反映了不同的SAR特征。3D AL模型可以存储为不同格式的图像,因此适合采用图像分析方法,而在图形SAR分析的背景下,迄今为止尚未考虑过这些方法。在这项概念验证研究中,针对各种化合物活性类别构建了3D ALs,并生成和分类了具有不同拓扑结构和信息内容的3D AL图像变体。为此,卷积神经网络(CNNs)最初应用于原始3D AL模型的图像,这些图像通过颜色编码反映从不同视角获取的化合物效力信息。3D AL模型的图像被转换为变体,从中提取一维特征。其他机器学习方法,包括支持向量机(SVM)和随机森林(RF)算法,被应用于基于这些特征推导模型。此外,使用通过边缘滤波从图像中获得的其他特征对SVM和RF模型进行训练。机器学习能够准确区分具有不同拓扑结构和信息内容的3D AL图像变体。总体而言,直接从3D AL图像中学习特征表示的CNNs实现了最高的分类准确率。对于强调拓扑高程的图像变体,CNN、SVM和RF模型的预测性能最高。此外,在边缘滤波得到的基础图像上训练的SVM模型能够高精度地对这类图像进行分类,这进一步支持了高度依赖的拓扑特征在图像分析和预测中的关键作用。综上所述,我们的概念验证研究结果表明,图像分析在图形SAR探索方面具有巨大潜力,能够从3D ALs的拓扑特征系统地推断不同的SAR特征。