Xue Ling, Godden Jeffrey W, Stahura Florence L, Bajorath Jürgen
Department of Computer-Aided Drug Discovery, Albany Molecular Research, Inc., Bothell Research Center, 18804 North Creek Parkway, Bothell, Washington 98011, USA.
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1218-25. doi: 10.1021/ci030287u.
The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searching is discussed and applied to newly designed fingerprint representations. The approach is based on the analysis of characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biological activity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weighted according to observed bit frequencies are applied to signature bit positions when searching for similar compounds. In systematic similarity search calculations over 23 diverse activity classes, profile scaling consistently increased the performance of fingerprints containing property descriptors and/or structural keys. A significant improvement of approximately 15% was observed for a new fingerprint consisting of binary encoded molecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few false positives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimum performance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficient than in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively high weight on signature bit positions that were always, or almost always, set on was found to be the most effective scaling procedure. Analysis of class-specific search performance revealed that profile scaling of MP-MFP improved the similarity search results for each of the 23 activity classes.
讨论了用于相似性搜索的分子指纹的化合物类别特异性分析和缩放概念,并将其应用于新设计的指纹表示。该方法基于对键控指纹中特征位模式的分析,这些特征位在具有等效生物活性的化合物中被设置。一旦为特定活性类别生成了指纹图谱,在搜索相似化合物时,根据观察到的位频率加权的缩放因子将应用于特征位位置。在对23种不同活性类别的系统相似性搜索计算中,图谱缩放始终提高了包含性质描述符和/或结构键的指纹的性能。对于由二进制编码的分子性质描述符和结构键组成的新指纹,观察到约15%的显著改善。在缩放条件下,这种称为MP-MFP的指纹平均能正确识别近60%的所有活性测试化合物,只有少数假阳性。MP-MFP优于MACCS键和其他参考指纹。一般来说,在缩放计算中,与未缩放计算相比,在较高的Tanimoto系数阈值下可实现最佳性能,从而提高搜索选择性。一般来说,对始终或几乎始终设置的特征位位置赋予相对较高的权重被发现是最有效的缩放过程。对类别特异性搜索性能的分析表明,MP-MFP的图谱缩放改善了23种活性类别中每一种的相似性搜索结果。