Nisius Britta, Vogt Martin, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universitat, D-53113 Bonn, Germany.
J Chem Inf Model. 2009 Jun;49(6):1347-58. doi: 10.1021/ci900087y.
The contribution of individual fingerprint bit positions to similarity search performance is systematically evaluated. A method is introduced to determine bit significance on the basis of Kullback-Leibler divergence analysis of bit distributions in active and database compounds. Bit divergence analysis and Bayesian compound screening share a common methodological foundation. Hence, given the significance ranking of all individual bit positions comprising a fingerprint, subsets of bits are evaluated in the context of Bayesian screening, and minimal fingerprint representations are determined that meet or exceed the search performance of unmodified fingerprints. For fingerprints of different design evaluated on many compound activity classes, we consistently find that subsets of fingerprint bit positions are responsible for search performance. In part, these subsets are very small and contain in some cases only a few fingerprint bit positions. Structural or pharmacophore patterns captured by preferred bit positions can often be directly associated with characteristic features of active compounds. In some cases, reduced fingerprint representations clearly exceed the search performance of the original fingerprints. Thus, fingerprint reduction likely represents a promising approach for practical applications.
系统地评估了各个指纹位位置对相似性搜索性能的贡献。引入了一种基于活性化合物和数据库化合物中位分布的库尔贝克-莱布勒散度分析来确定位重要性的方法。位散度分析和贝叶斯化合物筛选有着共同的方法基础。因此,给定构成指纹的所有单个位位置的重要性排名,在位散度分析的背景下评估位的子集,并确定满足或超过未修改指纹搜索性能的最小指纹表示。对于在许多化合物活性类别上评估的不同设计的指纹,我们一致发现指纹位位置的子集决定了搜索性能。部分情况下,这些子集非常小,在某些情况下仅包含几个指纹位位置。优选位位置捕获的结构或药效团模式通常可以直接与活性化合物的特征相关联。在某些情况下,简化的指纹表示明显超过了原始指纹的搜索性能。因此,指纹简化可能是一种很有前景的实际应用方法。