Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, Warsaw, Poland.
Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, Warsaw, Poland.
Bioinformatics. 2019 Apr 15;35(8):1334-1341. doi: 10.1093/bioinformatics/bty757.
Fingerprints (FPs) are the most common small molecule representation in cheminformatics. There are a wide variety of FPs, and the Extended Connectivity Fingerprint (ECFP) is one of the best-suited for general applications. Despite the overall FP abundance, only a few FPs represent the 3D structure of the molecule, and hardly any encode protein-ligand interactions.
Here, we present a Protein-Ligand Extended Connectivity (PLEC) FP that implicitly encodes protein-ligand interactions by pairing the ECFP environments from the ligand and the protein. PLEC FPs were used to construct different machine learning models tailored for predicting protein-ligand affinities (pKi∕d). Even the simplest linear model built on the PLEC FP achieved Rp = 0.817 on the Protein Databank (PDB) bind v2016 'core set', demonstrating its descriptive power.
The PLEC FP has been implemented in the Open Drug Discovery Toolkit (https://github.com/oddt/oddt).
Supplementary data are available at Bioinformatics online.
指纹(FPs)是化学生物信息学中最常见的小分子表示形式。有各种各样的 FPs,而扩展连接指纹(ECFP)是最适合一般应用的指纹之一。尽管 FP 总体上很丰富,但只有少数几个 FPs 代表分子的 3D 结构,几乎没有任何 FP 编码蛋白质-配体相互作用。
在这里,我们提出了一种蛋白质-配体扩展连接(PLEC)指纹,它通过将配体和蛋白质的 ECFP 环境配对,隐式地编码蛋白质-配体相互作用。PLEC FPs 用于构建不同的机器学习模型,专门用于预测蛋白质-配体亲和力(pKi∕d)。即使是基于 PLEC FP 构建的最简单的线性模型,在 Protein Databank(PDB)bind v2016“核心集”上的 Rp 值也达到了 0.817,证明了其描述能力。
PLEC FP 已在 Open Drug Discovery Toolkit(https://github.com/oddt/oddt)中实现。
补充数据可在生物信息学在线获得。