Wang Yuan, Eckert Hanna, Bajorath Jürgen
Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, 53113 Bonn, Germany.
ChemMedChem. 2007 Jul;2(7):1037-42. doi: 10.1002/cmdc.200700050.
Recently, systematic similarity calculations using Tversky coefficients have suggested that putting higher weight on bit settings of active reference molecules (templates) than database compounds increases hit rates in similarity searching using 2D fingerprints. These findings have been interpreted as evidence for "asymmetry" in chemical similarity searching. We have thoroughly analyzed this phenomenon and demonstrate that apparent asymmetry in similarity search calculations is a direct consequence of differences in fingerprint bit densities, which often correlate with differences in molecular size. Accordingly, a size-independent fingerprint with constant bit density does not produce asymmetrical search results. For Tversky similarity calculations, differences in fingerprint bit densities between active and inactive compounds determine which weighting factors produce high hit rates.
最近,使用特沃斯基系数进行的系统相似性计算表明,在使用二维指纹进行相似性搜索时,对活性参考分子(模板)的位设置赋予比对数据库化合物更高的权重会提高命中率。这些发现被解释为化学相似性搜索中“不对称性”的证据。我们已经对这一现象进行了深入分析,并证明相似性搜索计算中明显的不对称性是指纹位密度差异的直接结果,而指纹位密度差异通常与分子大小差异相关。因此,具有恒定位密度的与大小无关的指纹不会产生不对称的搜索结果。对于特沃斯基相似性计算,活性和非活性化合物之间指纹位密度的差异决定了哪些加权因子会产生高命中率。