Hu Ye, Lounkine Eugen, Batista José, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, D-53113 Bonn, Germany.
Chem Biol Drug Des. 2008 Nov;72(5):341-9. doi: 10.1111/j.1747-0285.2008.00723.x.
The design and evaluation of structural key-type fingerprints is reported that consist of only 10-30 substructures isolated from randomly generated fragment populations of different classes of active compounds. To identify minimal sets of fragments that carry substantial compound class-specific information, fragment frequency calculations are applied to guide fingerprint generation. These compound class-directed and extremely small structural fingerprints push the design of so-called mini-fingerprints to the limit and are the shortest bit string fingerprints reported to date. For the application of relative frequency-based activity class characteristic substructure fingerprints, a bit density-dependent similarity metric is introduced that makes it possible to adjust similarity coefficients for individual compound classes and balance the recall of active compounds with database selection size. In similarity search trials, these small compound class-directed fingerprints enrich active compounds in relatively small database selection sets and approach or exceed the performance of widely used structural fingerprints of much larger size and higher complexity.
本文报道了结构关键型指纹图谱的设计与评估,其仅由10 - 30个亚结构组成,这些亚结构是从不同类别的活性化合物的随机生成片段群体中分离出来的。为了识别携带大量化合物类别特异性信息的最小片段集,应用片段频率计算来指导指纹图谱的生成。这些化合物类别导向的极小结构指纹图谱将所谓的微型指纹图谱设计推向了极限,是迄今为止报道的最短位串指纹图谱。对于基于相对频率的活性类别特征亚结构指纹图谱的应用,引入了一种位密度依赖性相似性度量,使得能够针对各个化合物类别调整相似性系数,并在活性化合物的召回率与数据库选择大小之间取得平衡。在相似性搜索试验中,这些小型化合物类别导向的指纹图谱在相对较小的数据库选择集中富集活性化合物,其性能接近或超过了广泛使用的更大尺寸和更高复杂性的结构指纹图谱。