Rensselaer Exploratory Center for Cheminformatics Research, and Department of Chemistry & Chemical Biology, Rensselaer Polytechnic Institute, Troy, New York 12180, United States.
J Phys Chem A. 2011 Nov 17;115(45):12905-18. doi: 10.1021/jp204022u. Epub 2011 Sep 1.
Discontinuous changes in molecular structure (resulting from continuous transformations of molecular coordinates) lead to changes in chemical properties and biological activities that chemists attempt to describe through structure-activity or structure-property relationships (QSAR/QSPR). Such relationships are commonly envisioned in a continuous high-dimensional space of numerical descriptors, referred to as chemistry space. The choice of descriptors defining coordinates within chemistry space and the choice of similarity metrics thus influence the partitioning of this space into regions corresponding to local structural similarity. These are the regions (known as domains of applicability) most likely to be successfully modeled by a structure-activity relationship. In this work the network topology and scaling relationships of chemistry spaces are first investigated independent of a specific biological activity. Chemistry spaces studied include the ZINC data set, a qHTS PubChem bioassay, as well as the space of protein binding sites from the PDB. The characteristics of these networks are compared and contrasted with those of the bioassay SALI subnetwork, which maps discontinuities or cliffs in the structure-activity landscape. Mapping the locations of activity cliffs and comparing the global characteristics of SALI subnetworks with those of the underlying chemistry space networks generated using different representations, can guide the choice of a better representation. A higher local density of SALI edges with a particular representation indicates a more challenging structure-activity relationship using that fingerprint in that region of chemistry space.
分子结构的不连续变化(源于分子坐标的连续变换)导致化学性质和生物活性的变化,化学家试图通过结构-活性或结构-性质关系(QSAR/QSPR)来描述这些变化。这些关系通常被设想在一个连续的高维数值描述符空间中,称为化学空间。描述符的选择定义了化学空间中的坐标,相似性度量的选择因此影响了将这个空间划分为对应于局部结构相似性的区域。这些是最有可能通过结构-活性关系成功建模的区域(称为适用性域)。在这项工作中,首先研究了化学空间的网络拓扑结构和缩放关系,而不考虑特定的生物活性。所研究的化学空间包括 ZINC 数据集、qHTS PubChem 生物测定以及 PDB 中的蛋白质结合位点空间。这些网络的特征与 SALI 子网络的特征进行了比较和对比,SALI 子网络映射了结构-活性景观中的不连续性或悬崖。映射活性悬崖的位置,并将 SALI 子网的全局特征与使用不同表示生成的基础化学空间网络的特征进行比较,可以指导更好表示的选择。特定表示形式的 SALI 边的局部密度较高表示在该化学空间区域中使用该指纹的结构-活性关系更具挑战性。