Kuwahara Hiroyuki, Gao Xin
Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
J Cheminform. 2021 Mar 23;13(1):27. doi: 10.1186/s13321-021-00506-2.
Two-dimensional (2D) chemical fingerprints are widely used as binary features for the quantification of structural similarity of chemical compounds, which is an important step in similarity-based virtual screening (VS). Here, using an eigenvalue-based entropy approach, we identified 2D fingerprints with little to no contribution to shaping the eigenvalue distribution of the feature matrix as related ones and examined the degree to which these related 2D fingerprints influenced molecular similarity scores calculated with the Tanimoto coefficient. Our analysis identified many related fingerprints in publicly available fingerprint schemes and showed that their presence in the feature set could have substantial effects on the similarity scores and bias the outcome of molecular similarity analysis. Our results have implication in the optimal selection of 2D fingerprints for compound similarity analysis and the identification of potential hits for compounds with target biological activity in VS.
二维(2D)化学指纹作为二元特征被广泛用于量化化合物的结构相似性,这是基于相似性的虚拟筛选(VS)中的重要一步。在此,我们使用基于特征值的熵方法,将对塑造特征矩阵的特征值分布贡献很小或没有贡献的二维指纹识别为相关指纹,并研究了这些相关二维指纹对用Tanimoto系数计算的分子相似性得分的影响程度。我们的分析在公开可用的指纹方案中识别出许多相关指纹,并表明它们在特征集中的存在可能对相似性得分产生重大影响,并使分子相似性分析的结果产生偏差。我们的结果对于化合物相似性分析中二维指纹的最佳选择以及在虚拟筛选中识别具有目标生物活性的化合物的潜在命中物具有启示意义。