Joint Center for Structural Genomics (http://www.jcsg.org) Bioinformatics and Systems Biology Program, Sanford Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0446, USA.
Joint Center for Structural Genomics (http://www.jcsg.org) Bioinformatics and Systems Biology Program, Sanford Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA.
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W430-5. doi: 10.1093/nar/gku450. Epub 2014 Jun 23.
PubServer, available at http://pubserver.burnham.org/, is a tool to automatically collect, filter and analyze publications associated with groups of homologous proteins. Protein entries in databases such as Entrez Protein database at NCBI contain information about publications associated with a given protein. The scope of these publications varies a lot: they include studies focused on biochemical functions of individual proteins, but also reports from genome sequencing projects that introduce tens of thousands of proteins. Collecting and analyzing publications related to sets of homologous proteins help in functional annotation of novel protein families and in improving annotations of well-studied protein families or individual genes. However, performing such collection and analysis manually is a tedious and time-consuming process. PubServer automatically collects identifiers of homologous proteins using PSI-Blast, retrieves literature references from corresponding database entries and filters out publications unlikely to contain useful information about individual proteins. It also prepares simple vocabulary statistics from titles, abstracts and MeSH terms to identify the most frequently occurring keywords, which may help to quickly identify common themes in these publications. The filtering criteria applied to collected publications are user-adjustable. The results of the server are presented as an interactive page that allows re-filtering and different presentations of the output.
PubServer 可在 http://pubserver.burnham.org/ 获取,是一个自动收集、筛选和分析与同源蛋白组相关文献的工具。NCBI 的 Entrez Protein 数据库等数据库中的蛋白条目包含与给定蛋白相关文献的信息。这些文献的范围差别很大:它们包括专注于单个蛋白生化功能的研究,但也包括基因组测序项目的报告,其中引入了成千上万的蛋白。收集和分析与同源蛋白组相关的文献有助于对新蛋白家族进行功能注释,并改进对研究充分的蛋白家族或个别基因的注释。然而,手动执行此类收集和分析是一项繁琐且耗时的过程。PubServer 使用 PSI-Blast 自动收集同源蛋白的标识符,从相应的数据库条目中检索文献参考文献,并筛选出不太可能包含关于单个蛋白有用信息的文献。它还从标题、摘要和 MeSH 术语中准备简单的词汇统计,以识别最常出现的关键词,这有助于快速识别这些文献中的常见主题。应用于收集文献的筛选标准是用户可调整的。服务器的结果以一个交互式页面呈现,允许重新筛选和以不同的方式呈现输出。