Paul Nicodéme, Kellenberger Esther, Bret Guillaume, Müller Pascal, Rognan Didier
Bioinformatics Group, Laboratoire de Pharmacochimie de la Communication Cellulaire, CNRS UMR 7081, Illkirch, France.
Proteins. 2004 Mar 1;54(4):671-80. doi: 10.1002/prot.10625.
The Protein Data Bank (PDB) has been processed to extract a screening protein library (sc-PDB) of 2148 entries. A knowledge-based detection algorithm has been applied to 18,000 PDB files to find regular expressions corresponding to either protein, ions, co-factors, solvent, or ligand atoms. The sc-PDB database comprises high-resolution X-ray structures of proteins for which (i) a well-defined active site exists, (ii) the bound-ligand is a small molecular weight molecule. The database has been screened by an inverse docking tool derived from the GOLD program to recover the known target of four unrelated ligands. Both the database and the inverse screening procedures are accurate enough to rank the true target of the four investigated ligands among the top 1% scorers, with 70-100 fold enrichment with respect to random screening. Applying the proposed screening procedure to a small-sized generic ligand was much less accurate suggesting that inverse screening shall be reserved to rather selective compounds.
蛋白质数据库(PDB)已被处理,以提取一个包含2148个条目的筛选蛋白质文库(sc-PDB)。一种基于知识的检测算法已应用于18000个PDB文件,以找到与蛋白质、离子、辅因子、溶剂或配体原子相对应的正则表达式。sc-PDB数据库包含蛋白质的高分辨率X射线结构,这些蛋白质满足以下条件:(i)存在明确界定的活性位点;(ii)结合的配体是小分子重量分子。该数据库已通过源自GOLD程序的反向对接工具进行筛选,以找回四种不相关配体的已知靶点。数据库和反向筛选程序都足够准确,能够在得分最高的前1%中对四种被研究配体的真实靶点进行排名,相对于随机筛选,富集度为70至100倍。将所提出的筛选程序应用于小型通用配体时,准确性要低得多,这表明反向筛选应保留给选择性较强的化合物。