Faculty of Information Science and Technology, National University of Malaysia, 43600 UKM Bangi, Malaysia.
J Chem Inf Model. 2010 Aug 23;50(8):1340-9. doi: 10.1021/ci1001235.
This paper discusses the weighting of two-dimensional fingerprints for similarity-based virtual screening, specifically the use of weights that assign greatest importance to the substructural fragments that occur least frequently in the database that is being screened. Virtual screening experiments using the MDL Drug Data Report and World of Molecular Bioactivity databases show that the use of such inverse frequency weighting schemes can result, in some circumstances, in marked increases in screening effectiveness when compared with the use of conventional, unweighted fingerprints. Analysis of the characteristics of the various schemes demonstrates that such weights are best used to weight the fingerprint of the reference structure in a similarity search, with the database structures' fingerprints unweighted. However, the increases in performance resulting from such weights are only observed with structurally homogeneous sets of active molecules; when the actives are diverse, the best results are obtained using conventional, unweighted fingerprints for both the reference structure and the database structures.
本文讨论了基于相似性的虚拟筛选中二维指纹的加权问题,特别是使用权重的方法,为数据库中出现频率最低的子结构片段分配最大的权重。使用 MDL Drug Data Report 和 World of Molecular Bioactivity 数据库进行虚拟筛选实验表明,在某些情况下,与使用传统的未加权指纹相比,使用这种逆频率加权方案可以显著提高筛选效果。对各种方案特征的分析表明,这种权重最适合用于在相似性搜索中加权参考结构的指纹,而数据库结构的指纹则不加权。然而,只有在活性分子结构均匀的情况下,这种权重才能提高性能;当活性分子多样化时,最好的结果是使用传统的、未加权的指纹来表示参考结构和数据库结构。