Xue Ling, Stahura Florence L, Bajorath Jürgen
Department of Computer-Aided Drug Discovery, Albany Molecular Research, Inc., AMRI Bothell Research Center, 18804 North Creek Parkway, Bothell, Washington 98011-8012, USA.
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2032-9. doi: 10.1021/ci0400819.
Fingerprint scaling is a method to increase the performance of similarity search calculations. It is based on the detection of bit patterns in keyed fingerprints that are signatures of specific compound classes. Application of scaling factors to consensus bits that are mostly set on emphasizes signature bit patterns during similarity searching and has been shown to improve search results for different fingerprints. Similarity search profiling has recently been introduced as a method to analyze similarity search calculations. Profiles separately monitor correctly identified hits and other detected database compounds as a function of similarity threshold values and make it possible to estimate whether virtual screening calculations can be successful or to evaluate why they fail. This similarity search profile technique has been applied here to study fingerprint scaling in detail and better understand effects that are responsible for its performance. In particular, we have focused on the qualitative and quantitative analysis of similarity search profiles under scaling conditions. Therefore, we have carried out systematic similarity search calculations for 23 biological activity classes under scaling conditions over a wide range of scaling factors in a compound database containing approximately 1.3 million molecules and monitored these calculations in similarity search profiles. Analysis of these profiles confirmed increases in hit rates as a consequence of scaling and revealed that scaling influences similarity search calculations in different ways. Based on scaled similarity search profiles, compound sets could be divided into different categories. In a number of cases, increases in search performance under scaling conditions were due to a more significant relative increase in correctly identified hits than detected false-positives. This was also consistent with the finding that preferred similarity threshold values increased due to fingerprint scaling, which was well illustrated by similarity search profiling.
指纹缩放是一种提高相似性搜索计算性能的方法。它基于对键控指纹中位模式的检测,这些位模式是特定化合物类别的特征。将缩放因子应用于大多设置为“开”的共识位,在相似性搜索过程中突出特征位模式,并且已证明这可以改善不同指纹的搜索结果。相似性搜索分析最近被引入作为一种分析相似性搜索计算的方法。分析分别监测正确识别的命中结果和其他检测到的数据库化合物与相似性阈值的函数关系,从而能够估计虚拟筛选计算是否会成功,或者评估其失败的原因。这种相似性搜索分析技术已在此处应用,以详细研究指纹缩放,并更好地理解影响其性能的因素。特别是,我们专注于缩放条件下相似性搜索分析的定性和定量分析。因此,我们在一个包含约130万个分子的化合物数据库中,针对23种生物活性类别,在广泛的缩放因子范围内进行了缩放条件下的系统相似性搜索计算,并在相似性搜索分析中监测这些计算。对这些分析的研究证实了缩放导致命中率增加,并揭示了缩放以不同方式影响相似性搜索计算。基于缩放后的相似性搜索分析,化合物集可分为不同类别。在许多情况下,缩放条件下搜索性能的提高是由于正确识别的命中结果的相对增加比检测到的假阳性更显著。这也与指纹缩放导致首选相似性阈值增加的发现一致,相似性搜索分析很好地说明了这一点。