Adams Melanie A, Suits Michael D L, Zheng Jimin, Jia Zongchao
Department of Biochemistry, Queen's University, Kingston, ON, Canada.
Proteomics. 2007 Aug;7(16):2920-32. doi: 10.1002/pmic.200700099.
The combination of genomic sequencing with structural genomics has provided a wealth of new structures for previously uncharacterized ORFs, more commonly referred to as hypothetical proteins. This rapid growth has been the direct result of high-throughput, automated approaches in both the identification of new ORFs and the determination of high-resolution 3-D protein structures. A significant bottleneck is reached, however, at the stage of functional annotation in that the assignment of function is not readily automatable. It is often the case that the initial structural analysis at best indicates a functional family for a given hypothetical protein, but further identification of a relevant ligand or substrate is impeded by the diversity of function in a particular structural classification of proteins family, a highly selective and specific ligand-binding site, or the identification of a novel protein fold. Our approach to the functional annotation of hypothetical proteins relies on the combination of structural information with additional bioinformatics evidence garnered from operon prediction, loose functional information of additional operon members, conservation of catalytic residues, as well as cocrystallization trials and virtual ligand screening. The synthesis of all available information for each protein has permitted the functional annotation of several hypothetical proteins from Escherichia coli and each assignment has been confirmed through generally accepted biochemical methods.
基因组测序与结构基因组学的结合为以前未表征的开放阅读框(更常见地称为假设蛋白)提供了大量新结构。这种快速增长是高通量自动化方法在新开放阅读框识别和高分辨率三维蛋白质结构测定方面直接产生的结果。然而,在功能注释阶段出现了一个重大瓶颈,因为功能分配不容易自动化。通常情况下,最初的结构分析充其量只能表明给定假设蛋白的功能家族,但由于蛋白质家族特定结构分类中的功能多样性、高度选择性和特异性的配体结合位点或新型蛋白质折叠的识别,进一步识别相关配体或底物受到阻碍。我们对假设蛋白进行功能注释的方法依赖于结构信息与从操纵子预测、其他操纵子成员的松散功能信息、催化残基的保守性以及共结晶试验和虚拟配体筛选中获得的其他生物信息学证据的结合。对每种蛋白质所有可用信息的综合分析使得对来自大肠杆菌的几种假设蛋白进行了功能注释,并且每项注释都通过普遍接受的生化方法得到了证实。