Center for Integrative Bioinformatics Vienna, Max F Perutz Laboratories, Dr Bohrgasse 9, A-1030 Vienna, Austria.
BMC Bioinformatics. 2010 Aug 9;11:417. doi: 10.1186/1471-2105-11-417.
The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents.
We present the Feature Architecture Comparison Tool http://www.cibiv.at/FACT to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5 equivalent in Trypanosoma brucei.
FACT facilitates a quick and sensitive search for functionally equivalent proteins in entire proteomes. FACT is complementary to approaches using sequence similarity to identify proteins with the same function. Thus, FACT is particularly useful when functional equivalents need to be identified in evolutionarily distant species, or when functional equivalents are not homologous. The most reliable annotation transfers, however, are achieved when feature architecture similarity and sequence similarity are jointly taken into account.
越来越多的基因组测序为探索生命之树中的遗传和功能多样性提供了基础。只有一小部分编码蛋白经过了彻底的实验表征。对于其余的,生物信息学注释工具是推断其功能的唯一手段。利用与已经表征的蛋白质的显著序列相似性,通常被视为同源性的证据,是推断功能等效性的流行方法。当同源物过于多样化,或者它们具有不同的功能时,这种方法就会失败。最后,由于趋同进化,功能等效性不一定与共同祖先有关。因此,需要采用互补的方法来识别功能等效物。
我们提出了特征架构比较工具 http://www.cibiv.at/FACT,用于搜索功能等效的蛋白质。FACT 使用两个蛋白质的特征架构之间的相似性,即功能域、二级结构元件和组成特性的排列,作为它们功能等效性的代理。评分函数测量特征架构的相似性,这使得可以在整个蛋白质组中搜索功能等效物。我们对 9570 个 EC 分类酶的评估表明,FACT 使用完整的特征集,在识别功能等效物方面明显优于现有的基于架构的方法,因为它可以识别出更多作为最高得分蛋白质的功能等效物。我们表明,FACT 可以识别出没有显著序列相似性的功能等效物。然而,当 FACT 的最高得分蛋白也是具有最高局部序列相似性的蛋白时,它在 99%的情况下与查询具有相同的功能。我们通过鉴定酵母谷胱甘肽代谢中的缺失环节,以及在 Trypanosoma brucei 中搜索人类 GolgA5 的等效物,证明了 FACT 的多功能性。
FACT 促进了在整个蛋白质组中快速、敏感地搜索功能等效蛋白。FACT 与使用序列相似性识别具有相同功能的蛋白的方法互补。因此,当需要在进化上相距较远的物种中识别功能等效物,或者当功能等效物不是同源时,FACT 特别有用。然而,当特征架构相似性和序列相似性被联合考虑时,可以实现最可靠的注释转移。