Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA.
BMC Bioinformatics. 2013 Jan 28;14:30. doi: 10.1186/1471-2105-14-30.
The size of the protein sequence database has been exponentially increasing due to advances in genome sequencing. However, experimentally characterized proteins only constitute a small portion of the database, such that the majority of sequences have been annotated by computational approaches. Current automatic annotation pipelines inevitably introduce errors, making the annotations unreliable. Instead of such error-prone automatic annotations, functional interpretation should rely on annotations of 'reference proteins' that have been experimentally characterized or manually curated.
The Seq2Ref server uses BLAST to detect proteins homologous to a query sequence and identifies the reference proteins among them. Seq2Ref then reports publications with experimental characterizations of the identified reference proteins that might be relevant to the query. Furthermore, a plurality-based rating system is developed to evaluate the homologous relationships and rank the reference proteins by their relevance to the query.
The reference proteins detected by our server will lend insight into proteins of unknown function and provide extensive information to develop in-depth understanding of uncharacterized proteins. Seq2Ref is available at: http://prodata.swmed.edu/seq2ref.
由于基因组测序技术的进步,蛋白质序列数据库的规模呈指数级增长。然而,实验鉴定的蛋白质仅占数据库的一小部分,因此大多数序列都是通过计算方法进行注释的。当前的自动注释流水线不可避免地会引入错误,从而导致注释不可靠。功能解释不应依赖于易错的自动注释,而应依赖于经过实验鉴定或人工整理的“参考蛋白”的注释。
Seq2Ref 服务器使用 BLAST 检测与查询序列同源的蛋白质,并在其中识别参考蛋白质。然后,Seq2Ref 会报告对鉴定出的参考蛋白质进行实验鉴定的出版物,这些出版物可能与查询相关。此外,还开发了一种基于多数的评分系统,用于评估同源关系,并根据与查询的相关性对参考蛋白质进行排名。
我们的服务器检测到的参考蛋白质将深入了解未知功能的蛋白质,并提供广泛的信息,以深入了解未鉴定的蛋白质。Seq2Ref 可在以下网址获取:http://prodata.swmed.edu/seq2ref。