Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan.
Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan.
J Comput Aided Mol Des. 2021 Feb;35(2):179-193. doi: 10.1007/s10822-020-00361-7. Epub 2021 Jan 4.
Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets.
定量构效关系(QSAR)和定量构性关系(QSPR)模型基于化学结构与活性(性质)值之间的数值关系来预测生物活性和分子性质。分子表示在 QSAR/QSPR 分析中很重要。为此,通常利用分子结构的拓扑信息(二维表示)。然而,构象信息似乎很重要,因为分子处于三维空间中。作为一种适用于多种化合物的三维分子表示,先前已经提出了测试分子与一组参考分子之间的相似性。这种 3D 表示在虚拟筛选中对于早期富集活性化合物非常有效。在这项研究中,我们将 3D 表示引入了 QSAR/QSPR 建模(回归任务)中。此外,我们研究了 3D 表示相对于 2D 在训练数据集多样性方面的相对优势。对于基于量子力学的性质的预测任务,3D 表示优于 2D。对于预测小分子针对特定生物靶标的活性,使用这两种类型的表示,无论训练数据集的多样性如何,性能差异均无明显趋势。