Bandyopadhyay Deepak, Huan Jun, Liu Jinze, Prins Jan, Snoeyink Jack, Wang Wei, Tropsha Alexander
Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA.
Protein Sci. 2006 Jun;15(6):1537-43. doi: 10.1110/ps.062189906.
We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.
我们描述了一种使用家族特异性指纹图谱为功能家族赋予蛋白质结构的方法。指纹图谱代表了在一个家族的大多数成员中出现但在背景(PDB的一个非冗余子集)中罕见的氨基酸堆积模式;它们的信息是序列比对、序列模式、结构叠加和活性位点模板之外的补充。使用频繁子图挖掘为SCOP中的120个家族推导了指纹图谱。对于一个新结构,可以通过一种用于子图同构的快速算法找到这些家族特异性指纹图谱的所有出现情况;然后可以根据找到的指纹图谱数量及其在背景蛋白中的分布得出的置信值,将该结构归入一个家族。在验证实验中,我们推断添加到SCOP家族中的新成员的功能,并区分结构相似但功能不同的TIM桶家族。然后我们将我们的方法应用于预测几种结构基因组学蛋白质的功能,包括孤儿结构。一些预测已得到其他计算方法的证实,一些已通过后续的功能表征得到验证。