Jasial Swarit, Hu Ye, Vogt Martin, Bajorath Jürgen
Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany.
F1000Res. 2016 Apr 6;5. doi: 10.12688/f1000research.8357.2. eCollection 2016.
A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.
化学信息学中一个很大程度上尚未解决的问题是,计算得到的化合物相似性与活性相似性之间的关系问题,而这对于许多应用来说至关重要。一般来说,活性关系是根据计算得到的相似性值来预测的。然而,在计算得到的分子相似性和观察到的活性相似性之间,并没有坚实的科学基础来建立联系。因此,通过相似性搜索识别新的活性化合物的成功率是有限的。尽管已经进行了各种尝试来建立计算得到的指纹相似性值与生物活性之间的关系,但这些尝试都没有产生适用于相似性搜索的通用规则。在本研究中,我们以一种更基本的方式解决了分子相似性与活性相似性的问题。首先,我们评估了对于标准指纹,原则上是否可以识别出与活性相关的相似性值范围,并将其与随机化合物比较产生的相似性区分开来。然后,我们分析了与活性相关的相似性值是否可以用于指导旨在识别数据库中活性化合物的典型相似性搜索计算。结果发现,与活性相关的相似性值可以被识别为指纹的一个特征。然而,研究还表明,这些值不能可靠地用作实际相似性搜索计算的阈值。此外,本文所呈现的分析有助于解释指纹搜索性能上的差异。