Vogt Martin, Bajorath Jürgen
Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Dahlmannstr. 2, D-53113, Bonn, Germany.
Mol Inform. 2017 Jul;36(7). doi: 10.1002/minf.201600131. Epub 2016 Dec 29.
Similarity searching using molecular fingerprints has a long history in chemoinformatics and continues to be a popular approach for virtual screening. Typically, known active reference molecules are used to search databases for new active compounds. However, this search has black box character because similarity value distributions are dependent on fingerprints and compound classes. Consequently, no generally applicable similarity threshold values are available as reliable indicators of activity relationships between reference and database compounds. Therefore, it is generally uncertain where new active compounds might appear in database rankings, if at all. In this contribution, methods are discussed for modeling similarity value distributions of fingerprint search calculations using Tanimoto coefficients and estimating rank positions of active compounds. To our knowledge, these are the first approaches for predicting the results of fingerprint-based similarity searching.
使用分子指纹进行相似性搜索在化学信息学领域有着悠久的历史,并且仍然是虚拟筛选的一种常用方法。通常,已知的活性参考分子用于在数据库中搜索新的活性化合物。然而,这种搜索具有黑箱性质,因为相似性值分布取决于指纹和化合物类别。因此,不存在普遍适用的相似性阈值作为参考化合物与数据库化合物之间活性关系的可靠指标。所以,通常不确定新的活性化合物在数据库排名中是否会出现,以及会出现在哪里。在本论文中,我们讨论了使用Tanimoto系数对指纹搜索计算的相似性值分布进行建模以及估计活性化合物排名位置的方法。据我们所知,这些是预测基于指纹的相似性搜索结果的首批方法。