Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil.
Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil Advanced Campus at Itabira, Universidade Federal de Itajubá, Itajubá, MG 35903-087, Brazil and Department of Computer Science and Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil.
Bioinformatics. 2015 Mar 15;31(6):864-70. doi: 10.1093/bioinformatics/btu746. Epub 2014 Nov 10.
Currently, 25% of proteins annotated in Pfam have their function unknown. One way of predicting proteins function is by looking at their active site, which has two main parts: the catalytic site and the substrate binding site. The active site is more conserved than the other residues of the protein and can be a rich source of information for protein function prediction. This article presents a new heuristic method, named genetic active site search (GASS), which searches for given active site 3D templates in unknown proteins. The method can perform non-exact amino acid matches (conservative mutations), is able to find amino acids in different chains and does not impose any restrictions on the active site size.
GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.
目前,Pfam 中注释的蛋白质有 25%其功能未知。预测蛋白质功能的一种方法是观察其活性位点,活性位点有两个主要部分:催化位点和底物结合位点。活性位点比蛋白质的其他残基更保守,并且可以为蛋白质功能预测提供丰富的信息来源。本文提出了一种新的启发式方法,称为遗传活性位点搜索(GASS),它可以在未知蛋白质中搜索给定的活性位点 3D 模板。该方法可以进行非精确的氨基酸匹配(保守突变),能够找到不同链中的氨基酸,并且不对活性位点的大小施加任何限制。
在四个不同的数据集和两种其他方法(子结构的氨基酸模式搜索和基序和催化位点识别)中,将 GASS 结果与催化位点图集(CSA)中编目的结果进行了比较。结果表明,GASS 可以正确识别搜索到的 >90%的模板。还使用来自底物结合位点预测竞赛 CASP 10 的数据进行了实验,在考虑的 18 种方法中,GASS 排名第四。