Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen G J P, Tetko I V, Bender A, Svozil D
CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.
Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
J Cheminform. 2020 May 29;12(1):39. doi: 10.1186/s13321-020-00443-6.
An affinity fingerprint is the vector consisting of compound's affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
亲和力指纹是由化合物针对蛋白质靶点参考面板的亲和力或效价组成的向量。在此,我们展示了QAFFP指纹,这是一种基于计算机虚拟定量构效关系的亲和力指纹,长度为440个元素,其各个组成部分由基于ChEMBL数据库生物活性数据训练的随机森林回归模型进行预测。我们实现了QAFFP指纹的实值版本(rv - QAFFP)和二元版本(b - QAFFP),并评估了它们在相似性搜索、生物活性分类和骨架跳跃方面的性能,同时与长度为1024比特的Morgan2指纹(ECFP4指纹的RDKit实现)进行了比较。在相似性搜索和生物活性分类中,QAFFP指纹的检索率通过曲线下面积(AUC)(相似性搜索中,根据数据集不同,AUC约为0.65和0.70,分类中约为0.85)和富集因子EF5(相似性搜索中,根据数据集不同,EF5约为4.67和5.82,分类中约为2.10)来衡量,与Morgan2指纹的检索率(相似性搜索中,根据数据集不同,AUC约为0.57和0.66,EF5约为4.09和6.41,分类中AUC约为0.87,EF5约为2.16)相当。然而,在骨架跳跃方面,QAFFP指纹优于Morgan2指纹,因为它能够从现有的1749个骨架中检索出1146个,而Morgan2指纹仅能识别出864个骨架。