Wang Yuan, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany.
J Chem Inf Model. 2008 Jan;48(1):75-84. doi: 10.1021/ci700314x. Epub 2007 Dec 15.
Differences in molecular complexity and size are known to bias the evaluation of fingerprint similarity. For example, complex molecules tend to produce fingerprints with higher bit density than simple ones, which often leads to artificially high similarity values in search calculations. We introduce here a variant of the Tversky coefficient that makes it possible to modulate or eliminate molecular complexity effects when evaluating fingerprint similarity. This has enabled us to study in detail the role of molecular complexity in similarity searching and the relationship between reference and active database compounds. Balancing complexity effects leads to constant distributions of similarity values for reference and database molecules, independent of how compound contributions are weighted. When searching for active compounds with varying complexity, hit rates can be optimized by modulating complexity effects, rather than eliminating them, and adjusting relative compound weights. For reference molecules and active database compounds having different complexity, preferred parameter settings are identified.
已知分子复杂性和大小的差异会使指纹相似性评估产生偏差。例如,复杂分子往往比简单分子产生具有更高比特密度的指纹,这在搜索计算中常常导致人为的高相似性值。我们在此引入一种特沃斯基系数的变体,它能够在评估指纹相似性时调节或消除分子复杂性的影响。这使我们能够详细研究分子复杂性在相似性搜索中的作用以及参考化合物与活性数据库化合物之间的关系。平衡复杂性影响会导致参考分子和数据库分子的相似性值呈恒定分布,而与化合物贡献的加权方式无关。在搜索具有不同复杂性的活性化合物时,通过调节而非消除复杂性影响并调整相对化合物权重,可以优化命中率。对于具有不同复杂性的参考分子和活性数据库化合物,确定了优选的参数设置。