Kellenberger Esther, Muller Pascal, Schalon Claire, Bret Guillaume, Foata Nicolas, Rognan Didier
CNRS UMR7175-LC1, Institut Gilbert Laustriat, 74 Route du Rhin, F-67401 Illkirch Cédex, France.
J Chem Inf Model. 2006 Mar-Apr;46(2):717-27. doi: 10.1021/ci050372x.
The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at http://bioinfo-pharma.u-strasbg.fr/scPDB/.
小分子结合口袋数据库(sc-PDB)收集了蛋白质数据库(PDB)中的6415个结合位点的三维结构。结合位点是从所有高分辨率晶体结构中提取的,在这些结构中可以识别出蛋白质腔与小分子配体之间的复合物。重要的是,从药理学而非结构的角度考虑配体。因此,溶剂、去污剂和大多数金属离子不存储在sc-PDB中。配体分为四大类:核苷酸(<4聚体)、肽(<9聚体)、辅因子和有机化合物。相应的结合位点由所有蛋白质残基(包括氨基酸、辅因子和重要金属离子)形成,这些残基中至少有一个原子与任何配体原子的距离在6.5埃以内。通过浏览多个蛋白质数据库(PDB、UniProt和GO)并为每个sc-PDB条目存储以下特征,对该数据库进行了仔细注释:蛋白质名称、功能、来源、结构域和突变、配体名称和结构。还通过对分子支架的多样性分析对配体库进行了存档,并计算了几个化学信息学描述符,以更好地理解存储配体所覆盖的化学空间。sc-PDB可用于多种目的:(i)筛选结合位点集合以预测任何配体最可能的靶标;(ii)分析不同腔之间的分子相似性;(iii)推导描述配体药效基团点与活性位点性质之间关系的规则。该数据库会定期更新,可通过网络在http://bioinfo-pharma.u-strasbg.fr/scPDB/上访问。