Structural Chemogenomics, Laboratory of Therapeutic Innovation, UMR 7200 CNRS-UdS (Universite de Strasbourg), F-67400 Illkirch, France.
J Chem Inf Model. 2010 Jan;50(1):123-35. doi: 10.1021/ci900349y.
Inferring the biological function of a protein from its three-dimensional structure as well as explaining why a drug may bind to various targets is of crucial importance to modern drug discovery. Here we present a generic 4833-integer vector describing druggable protein-ligand binding sites that can be applied to any protein and any binding cavity. The fingerprint registers counts of pharmacophoric triplets from the Calpha atomic coordinates of binding-site-lining residues. Starting from a customized data set of diverse protein-ligand binding site pairs, the most appropriate metric and a similarity threshold could be defined for similar binding sites. The method (FuzCav) has been used in various scenarios: (i) screening a collection of 6000 binding sites for similarity to different queries; (ii) classifying protein families (serine endopeptidases, protein kinases) by binding site diversity; (iii) discriminating adenine-binding cavities from decoys. The fingerprint generation and comparison supports ultra-high throughput (ca. 1000 measures/s), does not require prior alignment of protein binding sites, and is able to detect local similarity among subpockets. It is thus particularly well suited to the functional annotation of novel genomic structures with low sequence identity to known X-ray templates.
从蛋白质的三维结构推断其生物功能,以及解释为什么一种药物可能与各种靶标结合,这对现代药物发现至关重要。在这里,我们提出了一个通用的 4833 整数向量,用于描述可成药的蛋白质-配体结合位点,可应用于任何蛋白质和任何结合腔。指纹寄存器记录结合位点衬里残基的 Cα原子坐标上的药效三联体计数。从不同的蛋白质-配体结合位点对的定制数据集开始,可以为相似的结合位点定义最合适的度量标准和相似性阈值。该方法(FuzCav)已应用于各种场景:(i)筛选 6000 个结合位点以与不同查询进行相似性比较;(ii)通过结合位点多样性对丝氨酸内肽酶、蛋白激酶等蛋白家族进行分类;(iii)区分腺嘌呤结合腔与诱饵。指纹生成和比较支持超高通量(约 1000 次/秒),不需要事先对齐蛋白质结合位点,并且能够检测亚口袋之间的局部相似性。因此,它特别适合对与已知 X 射线模板序列同一性低的新型基因组结构进行功能注释。