Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
Bioinformatics. 2013 Nov 1;29(21):2792-4. doi: 10.1093/bioinformatics/btt475. Epub 2013 Aug 20.
The ability to accurately measure structural similarities among small molecules is important for many analysis routines in drug discovery and chemical genomics. Algorithms used for this purpose include fragment-based fingerprint and graph-based maximum common substructure (MCS) methods. MCS approaches provide one of the most accurate similarity measures. However, their rigid matching policies limit them to the identification of perfect MCSs. To eliminate this restriction, we introduce a new mismatch tolerant search method for identifying flexible MCSs (FMCSs) containing a user-definable number of atom and/or bond mismatches.
The fmcsR package provides an R interface, with the time-consuming steps of the FMCS algorithm implemented in C++. It includes utilities for pairwise compound comparisons, structure similarity searching, clustering and visualization of MCSs. In comparison with an existing MCS tool, fmcsR shows better time performance over a wide range of compound sizes. When mismatching of atoms or bonds is turned on, the compute times increase as expected, and the resulting FMCSs are often substantially larger than their strict MCS counterparts. Based on extensive virtual screening (VS) tests, the flexible matching feature enhances the enrichment of active structures at the top of MCS-based similarity search results. With respect to overall and early enrichment performance, FMCS outperforms most of the seven other VS methods considered in these tests.
fmcsR is freely available for all common operating systems from the Bioconductor site (http://www.bioconductor.org/packages/devel/bioc/html/fmcsR.html).
Supplementary data are available at Bioinformatics online.
准确测量小分子结构相似性对于药物发现和化学基因组学中的许多分析程序非常重要。为此目的而使用的算法包括基于片段的指纹和基于图的最大公共子结构 (MCS) 方法。MCS 方法提供了最准确的相似度度量之一。然而,它们的刚性匹配策略将它们限制为识别完美的 MCS。为了消除这种限制,我们引入了一种新的不匹配容忍搜索方法,用于识别包含用户定义数量的原子和/或键不匹配的灵活 MCS (FMCS)。
fmcsR 包提供了一个 R 接口,其中 FMCS 算法的耗时步骤用 C++ 实现。它包括用于化合物对比较、结构相似性搜索、MCS 聚类和可视化的实用程序。与现有的 MCS 工具相比,fmcsR 在广泛的化合物大小范围内显示出更好的时间性能。当打开原子或键的不匹配时,计算时间会按预期增加,并且得到的 FMCS 通常比它们的严格 MCS 对应物大得多。基于广泛的虚拟筛选 (VS) 测试,灵活匹配功能增强了基于 MCS 的相似性搜索结果顶部的活性结构的富集。在整体和早期富集性能方面,FMCS 优于这些测试中考虑的其他七种 VS 方法中的大多数。
fmcsR 可从 Bioconductor 站点(http://www.bioconductor.org/packages/devel/bioc/html/fmcsR.html)在所有常见操作系统上免费使用。
补充数据可在 Bioinformatics 在线获得。