Strand Life Sciences, Fifth Floor, Kirloskar Business Park, Bellary Road, Hebbal, Bangalore 560024, India.
J Chem Inf Model. 2011 Apr 25;51(4):788-806. doi: 10.1021/ci100297y. Epub 2011 Mar 29.
Several efficient correspondence graph-based algorithms for determining the maximum common substructure (MCS) of a pair of molecules have been published in the literature. The extension of the problem to three or more molecules is however nontrivial; heuristics used to increase the efficiency in the two-molecule case are either inapplicable to the many-molecule case or do not provide significant speedups. Our specific algorithmic contribution is two-fold. First, we show how the correspondence graph approach for the two-molecule case can be generalized to obtain an algorithm that is guaranteed to find the optimum connected MCS of multiple molecules, and that runs fast on most families of molecules using a new divide-and-conquer strategy that has hitherto not been reported in this context. Second, we provide a characterization of those compound families for which the algorithm might run slowly, along with a heuristic for speeding up computations on these families. We also extend the above algorithm to a heuristic algorithm to find the disconnected MCS of multiple molecules and to an algorithm for clustering molecules into groups, with each group sharing a substantial MCS. Our methods are flexible in that they provide exquisite control on various matching criteria used to define a common substructure.
已经有一些基于有效配对比对图的算法被发表出来,用于确定一对分子的最大公共子结构(MCS)。然而,将问题扩展到三个或更多分子并不简单;在二分子情况下用于提高效率的启发式方法要么不适用于多分子情况,要么不能提供显著的加速。我们的具体算法贡献有两方面。首先,我们展示了如何将二分子情况下的配对比对图方法推广,以获得一种保证能够找到多个分子最优连接 MCS 的算法,并且使用一种新的分治策略在大多数分子家族上快速运行,而这种策略在以前的相关研究中并未被报道。其次,我们对算法可能运行缓慢的化合物家族进行了特征描述,并提供了一种启发式方法来加速这些家族的计算。我们还将上述算法扩展到一个启发式算法,用于寻找多个分子的不连接 MCS,以及一个用于将分子聚类成具有共享大量 MCS 的组的算法。我们的方法具有灵活性,它们提供了对用于定义公共子结构的各种匹配标准的精确控制。