Insilico Medicine , Baltimore , Maryland 21218 , United States.
Moscow State University , Moscow 119234 , Russia.
Mol Pharm. 2018 Oct 1;15(10):4378-4385. doi: 10.1021/acs.molpharmaceut.7b01134. Epub 2018 Mar 5.
Convolutional neural networks (CNN) have been successfully used to handle three-dimensional data and are a natural match for data with spatial structure such as 3D molecular structures. However, a direct 3D representation of a molecule with atoms localized at voxels is too sparse, which leads to poor performance of the CNNs. In this work, we present a novel approach where atoms are extended to fill other nearby voxels with a transformation based on the wave transform. Experimenting on 4.5 million molecules from the Zinc database, we show that our proposed representation leads to better performance of CNN-based autoencoders than either the voxel-based representation or the previously used Gaussian blur of atoms and then successfully apply the new representation to classification tasks such as MACCS fingerprint prediction.
卷积神经网络 (CNN) 已成功用于处理三维数据,非常适合具有空间结构的数据,如 3D 分子结构。然而,将原子定位在体素上的分子的直接 3D 表示过于稀疏,导致 CNN 的性能不佳。在这项工作中,我们提出了一种新的方法,其中原子通过基于波动变换的变换扩展到填充其他附近的体素。在来自 Zinc 数据库的 450 万个分子上进行实验,我们表明,与基于体素的表示或之前使用的原子高斯模糊相比,我们提出的表示方法使基于 CNN 的自动编码器的性能更好,然后成功地将新表示应用于分类任务,如 MACCS 指纹预测。