Singh Rahul
Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA.
BMC Cell Biol. 2007 Jul 10;8 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2121-8-S1-S6.
Discerning the similarity between molecules is a challenging problem in drug discovery as well as in molecular biology. The importance of this problem is due to the fact that the biochemical characteristics of a molecule are closely related to its structure. Therefore molecular similarity is a key notion in investigations targeting exploration of molecular structural space, query-retrieval in molecular databases, and structure-activity modelling. Determining molecular similarity is related to the choice of molecular representation. Currently, representations with high descriptive power and physical relevance like 3D surface-based descriptors are available. Information from such representations is both surface-based and volumetric. However, most techniques for determining molecular similarity tend to focus on idealized 2D graph-based descriptors due to the complexity that accompanies reasoning with more elaborate representations.
This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular surface properties such as shape, field strengths, and effects due to field super-positioning can then be captured as distributions on the surface of the sphere. Surface-based molecular similarity is subsequently determined by computing the similarity of the surface-property distributions using a novel formulation of histogram-intersection. The similarity formulation is not only sensitive to the 3D distribution of the surface properties, but is also highly efficient to compute.
The proposed method obviates the computationally expensive step of molecular pose-optimisation, can incorporate conformational variations, and facilitates highly efficient determination of similarity by directly comparing molecular surfaces and surface-based properties. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach.
识别分子间的相似性在药物发现以及分子生物学领域都是一个具有挑战性的问题。该问题之所以重要,是因为分子的生化特性与其结构密切相关。因此,分子相似性是针对分子结构空间探索、分子数据库中的查询检索以及构效建模等研究的关键概念。确定分子相似性与分子表示的选择有关。目前,具有高描述能力和物理相关性的表示方法,如基于3D表面的描述符已经存在。来自此类表示的信息既有基于表面的,也有基于体积的。然而,由于使用更精细的表示进行推理会带来复杂性,大多数确定分子相似性的技术往往侧重于理想化的基于2D图形的描述符。
本文解决了使用复杂的基于表面的表示来描述分子时确定相似性的问题。它提出了一种内在的球形表示,该表示将分子表面上的点系统地映射到标准坐标系(一个球体)上的点。然后,分子表面特性,如形状、场强以及场叠加效应等,可以作为球体表面上的分布来捕获。随后,通过使用一种新颖的直方图相交公式计算表面特性分布的相似性,来确定基于表面的分子相似性。该相似性公式不仅对表面特性的3D分布敏感,而且计算效率很高。
所提出的方法避免了分子构象优化这一计算成本高昂的步骤,可以纳入构象变化,并通过直接比较分子表面和基于表面的特性来促进高效的相似性确定。检索性能、在复杂生物学特性的构效建模中的应用以及与现有研究和商业方法的比较,都证明了该方法的有效性和实用性。