Hu Liegi, Benson Mark L, Smith Richard D, Lerner Michael G, Carlson Heather A
Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109-1065, USA.
Proteins. 2005 Aug 15;60(3):333-40. doi: 10.1002/prot.20512.
Binding MOAD (Mother of All Databases) is the largest collection of high-quality, protein-ligand complexes available from the Protein Data Bank. At this time, Binding MOAD contains 5331 protein-ligand complexes comprised of 1780 unique protein families and 2630 unique ligands. We have searched the crystallography papers for all 5000+ structures and compiled binding data for 1375 (26%) of the protein-ligand complexes. The binding-affinity data ranges 13 orders of magnitude. This is the largest collection of binding data reported to date in the literature. We have also addressed the issue of redundancy in the data. To create a nonredundant dataset, one protein from each of the 1780 protein families was chosen as a representative. Representatives were chosen by tightest binding, best resolution, etc. For the 1780 "best" complexes that comprise the nonredundant version of Binding MOAD, 475 (27%) have binding data. This significant collection of protein-ligand complexes will be very useful in elucidating the biophysical patterns of molecular recognition and enzymatic regulation. The complexes with binding-affinity data will help in the development of improved scoring functions and structure-based drug discovery techniques. The dataset can be accessed at http://www.BindingMOAD.org.
结合MOAD(所有数据库之母)是蛋白质数据库中最大的高质量蛋白质-配体复合物集合。目前,结合MOAD包含5331个蛋白质-配体复合物,由1780个独特的蛋白质家族和2630个独特的配体组成。我们检索了所有5000多个结构的晶体学论文,并整理了1375个(26%)蛋白质-配体复合物的结合数据。结合亲和力数据范围跨越13个数量级。这是迄今为止文献中报道的最大的结合数据集合。我们还解决了数据冗余问题。为创建一个非冗余数据集,从1780个蛋白质家族中各选一个蛋白质作为代表。通过最强结合力、最佳分辨率等选择代表。对于构成结合MOAD非冗余版本的1780个“最佳”复合物,475个(27%)有结合数据。这一重要的蛋白质-配体复合物集合对于阐明分子识别和酶调节的生物物理模式将非常有用。具有结合亲和力数据的复合物将有助于开发改进的评分函数和基于结构的药物发现技术。该数据集可在http://www.BindingMOAD.org上获取。