Arnold James R, Burdick Keith W, Pegg Scott C-H, Toba Samuel, Lamb Michelle L, Kuntz Irwin D
Department of Pharmaceutical Chemistry, School of Pharmacy, University of California San Francisco, Box 2240, N474-A Genentech Hall, San Francisco, California 94143-2240, USA.
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2190-8. doi: 10.1021/ci049814f.
Integrating biological and chemical information is one key task in drug discovery, and one approach to attaining this goal is via three-dimensional pharmacophore descriptors derived from protein binding sites. The SitePrint program generates, aligns, scores, and classifies three-dimensional pharmacophore descriptors, active site grids, and ligand surfaces. The descriptors are formed from molecular fragments that have been docked, minimized, filtered, and clustered in protein active sites. The descriptors have geometric coordinates derived from the fragment positions, and they capture the shape, electrostatics, locations, and angles of entry into pockets of the recognition sites: they also provide a direct link to databases of organic molecules. The descriptors have been shown to be robust with respect to small changes in protein structure observed when multiple compounds are cocrystallized in a protein. Five aligned thrombin cocrystals with an average core alpha-carbon RMSD of 0.7 A gave three-dimensional pharmacophore descriptors with an average RMSD of 1.1 A. On a larger test set, alignment and scoring of the descriptors using clique-based alignment, and a best first search strategy with an adapted forward-looking Ullmann heuristic was able to select the global minimum three-dimensional alignment in twenty-nine out of thirty cases in less than one CPU second on a workstation. A protein family based analysis was then performed to demonstrate the usefulness of the method in producing a correlation of active site pharmacophore descriptors to protein function. Each protein in a test set of thirty was assigned membership to a family based on computed active site similarity to the following families: kinases, nuclear receptors, the aspartyl, cysteine, serine, and metallo proteases. This method of classifying proteins is complementary to approaches based on sequence or fold homology. The values within protein families for correctly assigning membership of a protein to a family ranged from 25% to 80%.
整合生物和化学信息是药物发现中的一项关键任务,实现这一目标的一种方法是通过从蛋白质结合位点衍生的三维药效团描述符。SitePrint程序生成、对齐、评分并分类三维药效团描述符、活性位点网格和配体表面。这些描述符由已在蛋白质活性位点对接、最小化、过滤和聚类的分子片段形成。描述符具有从片段位置导出的几何坐标,它们捕捉识别位点口袋的形状、静电、位置和进入角度:它们还提供了与有机分子数据库的直接链接。当多种化合物在蛋白质中共结晶时,观察到蛋白质结构的微小变化,描述符对此表现出稳健性。五个对齐的凝血酶共晶体,平均核心α-碳RMSD为0.7 Å,给出的三维药效团描述符平均RMSD为1.1 Å。在一个更大的测试集上,使用基于团的对齐方法对描述符进行对齐和评分,并采用具有适应性前瞻性Ullmann启发式算法的最佳优先搜索策略,能够在工作站上不到一个CPU秒的时间内,在三十个案例中的二十九个案例中选择全局最小的三维对齐。然后进行了基于蛋白质家族的分析,以证明该方法在产生活性位点药效团描述符与蛋白质功能相关性方面的有用性。基于与以下家族(激酶、核受体、天冬氨酸、半胱氨酸、丝氨酸和金属蛋白酶)计算的活性位点相似性,测试集中的三十种蛋白质中的每一种都被指定属于一个家族。这种蛋白质分类方法与基于序列或折叠同源性的方法互补。蛋白质家族中正确将蛋白质归属到一个家族的准确率范围为25%至80%。