Nikonenko Aleksandra, Zankov Dmitry, Baskin Igor, Madzhidov Timur, Polishchuk Pavel
Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic.
A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia.
Mol Inform. 2021 Nov;40(11):e2060030. doi: 10.1002/minf.202060030. Epub 2021 Aug 3.
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
最广泛使用的定量构效关系(QSAR)方法主要基于二维分子表示,这种表示忽略了化合物的立体构型和构象灵活性。三维QSAR使用每种化合物的单一构象异构体,而合理选择这种构象异构体很困难。四维QSAR使用多个构象异构体来克服二维和三维方法存在的问题。然而,现有的许多四维QSAR模型都需要预先对齐构象异构体,而与对齐无关的方法往往忽略化合物的立体构型。在本研究中,我们提出了一种QSAR建模方法,该方法基于将各个构象异构体的手性感知三维药效团描述符转换为一组表示分子整个构象异构体集合的潜在变量。这是通过将所有训练集化合物的所有构象异构体聚类在一起实现的。化合物的最终表示是其构象异构体的聚类成员的位串编码。在我们的研究中,我们使用了随机森林,但这种表示可以与任何机器学习方法结合使用。我们使用多个数据集将这种方法与传统的二维和三维方法进行了比较,并研究了所提出方法对调优参数(构象异构体数量和聚类数量)的敏感性。