Grant Marianne A
Division of Molecular and Vascular Medicine and Center for Vascular Biology Research, Beth Israel Deaconess Medical Center, Department of Medicine, Harvard Medical School, Boston, Massachusetts, 02215.
Drug Dev Res. 2011 Feb;72(1):4-16. doi: 10.1002/ddr.20397.
Pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate innovative strategies for identifying valid targets and discovering leads against them as a way of accelerating drug discovery. The ever increasing number and diversity of novel protein sequences identified by genomic sequencing projects and the success of worldwide structural genomics initiatives have spurred great interest and impetus in the development of methods for accurate, computationally empowered protein function prediction and active site identification. Previously, in the absence of direct experimental evidence, homology-based protein function annotation remained the gold-standard for analysis and prediction of protein function. However, with the continued exponential expansion of sequence databases, this approach is not always applicable, as fewer query protein sequences demonstrate significant homology to protein gene products of known function. As a result, several non-homology based methods for protein function prediction that are based on sequence features, structure, evolution, biochemical and genetic knowledge have emerged. Herein, we review current bioinformatic programs and approaches for protein function prediction/annotation and discuss their integration into drug discovery initiatives. The development of such methods to annotate protein functional sites and their application to large protein functional families is crucial to successfully utilizing the vast amounts of genomic sequence information available to drug discovery and development processes.
药物研究人员必须评估大量的蛋白质序列,并制定创新策略来识别有效的靶点并找到针对这些靶点的先导化合物,以此加速药物研发进程。基因组测序项目所识别出的新型蛋白质序列数量不断增加且种类日益多样,以及全球结构基因组学计划的成功,激发了人们对开发准确的、具备计算能力的蛋白质功能预测和活性位点识别方法的浓厚兴趣和动力。以前,在缺乏直接实验证据的情况下,基于同源性的蛋白质功能注释一直是蛋白质功能分析和预测的金标准。然而,随着序列数据库持续呈指数级扩展,这种方法并不总是适用,因为与已知功能的蛋白质基因产物具有显著同源性的查询蛋白质序列越来越少。因此,出现了几种基于序列特征、结构、进化、生化和遗传知识的非同源性蛋白质功能预测方法。在此,我们综述了当前用于蛋白质功能预测/注释的生物信息学程序和方法,并讨论它们在药物研发计划中的整合。开发此类注释蛋白质功能位点的方法并将其应用于大型蛋白质功能家族,对于成功利用药物研发和开发过程中可用的大量基因组序列信息至关重要。