GlaxoSmithKline, Collegeville, PA, USA.
J Comput Aided Mol Des. 2009 Nov;23(11):773-84. doi: 10.1007/s10822-009-9273-4. Epub 2009 Jun 20.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
蛋白质功能预测是计算生物学的核心问题之一。我们提出了一种新的基于蛋白质结构的自动化功能预测方法,该方法使用了常见于已知功能家族的大多数蛋白质的局部残基堆积模式库。这种方法的关键在于将蛋白质结构表示为一个图,其中残基顶点(用作顶点标签的残基名称)通过几何接近边缘连接。该方法采用两步法。首先,它使用快速子图挖掘算法来查找所有特征良好的蛋白质结构和功能家族的特定于家族的标记子图的所有出现情况。其次,它使用图形索引来加速 Ullman 的子图同构算法,查询一组特征与已知家族的基序在新结构中的出现情况。从结构推断功能的置信度取决于在查询结构中找到的特定于家族的基序数量与在大型非冗余蛋白质数据库中的分布相比。在序列比对、序列模式、结构叠加和活性位点模板无法提供准确注释的情况下,这种方法可以将新结构分配给特定的功能家族。