Department of Life Science Informatics, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany.
J Chem Inf Model. 2011 Sep 26;51(9):2254-65. doi: 10.1021/ci200275m. Epub 2011 Aug 8.
In independent studies it has previously been demonstrated that two-dimensional (2D) fingerprints have scaffold hopping ability in virtual screening, although these descriptors primarily emphasize structural and/or topological resemblance of reference and database compounds. However, the mechanism by which such fingerprints enrich structurally diverse molecules in database selection sets is currently little understood. In order to address this question, similarity search calculations on 120 compound activity classes of varying structural diversity were carried out using atom environment fingerprints. Two feature selection methods, Kullback-Leibler divergence and gain ratio analysis, were applied to systematically reduce these fingerprints and generate alternative versions for searching. Gain ratio is a feature selection method from information theory that has thus far not been considered in fingerprint analysis. However, it is shown here to be an effective fingerprint feature selection approach. Following comparative feature selection and similarity searching, the compound recall characteristics of original and reduced fingerprint versions were analyzed in detail. Small sets of fingerprint features were found to distinguish subsets of active compounds from other database molecules. The compound recall of fingerprint similarity searching often resulted from a cumulative detection of distinct compound subsets by different fingerprint features, which provided a rationale for the scaffold hopping potential of these 2D fingerprints.
在独立研究中,先前已经证明二维(2D)指纹在虚拟筛选中具有支架跳跃能力,尽管这些描述符主要强调参考化合物和数据库化合物的结构和/或拓扑相似性。然而,目前对于这种指纹如何在数据库选择集中富集结构多样化的分子的机制还知之甚少。为了解决这个问题,使用原子环境指纹对 120 个具有不同结构多样性的化合物活性类别的相似性搜索计算进行了计算。应用了两种特征选择方法,Kullback-Leibler 散度和增益比分析,系统地减少这些指纹并生成用于搜索的替代版本。增益比是一种来自信息论的特征选择方法,迄今为止尚未在指纹分析中考虑。然而,事实证明,它是一种有效的指纹特征选择方法。经过比较特征选择和相似性搜索,详细分析了原始和简化指纹版本的化合物召回特性。发现指纹的一小部分特征可以区分活性化合物和数据库分子的其他子集。指纹相似性搜索的化合物召回率通常来自不同指纹特征对不同化合物子集的累积检测,这为这些 2D 指纹的支架跳跃潜力提供了依据。