ProFAT：一个用于蛋白质序列功能注释的基于网络的工具。

ProFAT: a web-based tool for the functional annotation of protein sequences.

作者信息

Bradshaw Charles Richard, Surendranath Vineeth, Habermann Bianca

机构信息

Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany.

出版信息

BMC Bioinformatics. 2006 Oct 23;7:466. doi: 10.1186/1471-2105-7-466.

DOI:10.1186/1471-2105-7-466

PMID:17059594

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1636073/

Abstract

BACKGROUND

The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.

RESULTS

We present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity. ProFAT combines sensitive sequence database search methods and a fold recognition algorithm with a simple text-mining approach. ProFAT extracts identified hits based on their biological background by keyword-mining of annotations, features and most importantly, literature associated with a sequence entry. A user-provided keyword list enables the user to specifically search for weak, but biologically relevant homologues of an input query. The ProFAT server has been evaluated using the complete set of proteins from three different domain families, including their weak relatives and could correctly identify between 90% and 100% of all domain family members studied in this context. ProFAT has furthermore been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins.

CONCLUSION

By employing sensitive database search programs as well as exploiting the functional information associated with database sequences, ProFAT can detect remote, but biologically relevant relationships between proteins and will assist researchers in the prediction of protein function based on remote homologies.

摘要

背景

蛋白质的功能注释依赖于序列数据库中已发表的关于其近源和远源同源物的信息。查询序列与数据库中鉴定出的序列具有相似的生物学背景，可进一步增强远源序列相似性的证据。然而，目前几乎没有工具能够在序列数据库搜索中纳入功能信息。

结果

我们展示了ProFAT，这是一种基于网络的工具，用于基于远源序列相似性对蛋白质序列进行功能注释。ProFAT将灵敏的序列数据库搜索方法、折叠识别算法与简单的文本挖掘方法相结合。ProFAT通过对注释、特征以及与序列条目相关的文献进行关键词挖掘，根据其生物学背景提取鉴定出的匹配结果。用户提供的关键词列表使用户能够专门搜索输入查询的微弱但具有生物学相关性的同源物。ProFAT服务器已使用来自三个不同结构域家族的完整蛋白质集进行评估，包括其远亲，并能在此背景下正确鉴定出90%至100%的所有研究结构域家族成员。ProFAT还被应用于来自不同细胞环境的多种蛋白质，我们提供了证据表明ProFAT如何基于远源保守蛋白质帮助进行蛋白质的功能预测。