Shin Hyun Kil
Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea.
Human and Environmental Toxicology, University of Science and Technology, Daejeon 34113, Republic of Korea.
ACS Omega. 2021 Dec 15;6(51):35757-35768. doi: 10.1021/acsomega.1c05693. eCollection 2021 Dec 28.
Deep learning (DL) models in quantitative structure-activity relationship fed the molecular structure directly to the network without using human-designed descriptors by representing molecule as a graph or string (e.g., SMILES code). However, these two representations were oversimplification of real molecules to reflect chemical properties of molecular structures. Given that the choice of molecular representation determines the architecture of the DL model to apply, a novel way of molecular representation can open a way to apply diverse DL networks developed and used in other fields. A topological distance-based electron interaction (TDEi) tensor has been developed in this study inspired by the quantum mechanical model of the molecule, which defines a molecule with electrons and protons. In the TDEi tensor, the atomic orbital (AO) of each atom is represented by an electron configuration (EC) vector, which is a bit string based on the presence and absence of electrons in each AO according to spin indicated by positive and negative signs. Interactions between EC vectors were calculated based on the topological distance between atoms in a molecule. As a molecular structure was translated into 3D array, CNN models (modified VGGNet) were applied using a TDEi tensor to predict four physicochemical properties of drug-like compound datasets: MP (275,131), Lipop (4193), Esol (1127), and Freesolv (639). Models achieved good prediction accuracy. PCA showed that a stronger correlation was observed between the extracted features and the target endpoint as features were extracted from the deeper layer.
定量构效关系中的深度学习(DL)模型通过将分子表示为图形或字符串(例如SMILES编码),直接将分子结构输入网络,而无需使用人工设计的描述符。然而,这两种表示形式都是对真实分子的过度简化,无法反映分子结构的化学性质。鉴于分子表示形式的选择决定了要应用的DL模型的架构,一种新颖的分子表示方式可以为应用在其他领域开发和使用的各种DL网络开辟道路。本研究受分子量子力学模型的启发,开发了一种基于拓扑距离的电子相互作用(TDEi)张量,该模型用电子和质子定义分子。在TDEi张量中,每个原子的原子轨道(AO)由电子构型(EC)向量表示,EC向量是一个基于每个AO中电子的存在与否(根据正负号表示的自旋)的位串。基于分子中原子之间的拓扑距离计算EC向量之间的相互作用。当分子结构被转换为三维数组时,使用TDEi张量应用卷积神经网络(CNN)模型(改进的VGGNet)来预测类药物化合物数据集的四种物理化学性质:熔点(MP,275131个数据点)、脂水分配系数(Lipop,4193个数据点)、溶解度(Esol,1127个数据点)和自由能(Freesolv,639个数据点)。模型取得了良好的预测准确性。主成分分析(PCA)表明,随着从更深层提取特征,提取的特征与目标终点之间观察到更强的相关性。