Krebs Werner G, Bourne Philip E
San Diego Supercomputer Center, San Diego, California, USA.
Bioinformatics. 2004 May 1;20(7):1066-73. doi: 10.1093/bioinformatics/bth039. Epub 2004 Feb 5.
Assignment of putative protein functional annotation by comparative analysis using pre-defined experimental annotations is performed routinely by molecular biologists. The number and statistical significance of these assignments remains a challenge in this era of high-throughput proteomics. A combined statistical method that enables robust, automated protein annotation by reliably expanding existing annotation sets is described. An existing clustering scheme, based on relevant experimental information (e.g. sequence identity, keywords or gene expression data) is required. The method assigns new proteins to these clusters with a measure of reliability. It can also provide human reviewers with a reliability score for both new and previously classified proteins.
A dataset of 27 000 annotated Protein Data Bank (PDB) polypeptide chains (of 36 000 chains currently in the PDB) was generated from 23 000 chains classified a priori.
PDB annotations and sample software implementation are freely accessible on the Web at http://pmr.sdsc.edu/go
分子生物学家通常通过使用预定义的实验注释进行比较分析来指定假定的蛋白质功能注释。在这个高通量蛋白质组学时代,这些注释的数量和统计显著性仍然是一个挑战。本文描述了一种组合统计方法,该方法通过可靠地扩展现有注释集来实现强大的自动化蛋白质注释。需要一个基于相关实验信息(例如序列同一性、关键词或基因表达数据)的现有聚类方案。该方法以可靠性度量将新蛋白质分配到这些聚类中。它还可以为人类评审员提供新蛋白质和先前分类蛋白质的可靠性评分。
从预先分类的23000条链中生成了一个包含27000条带注释的蛋白质数据库(PDB)多肽链的数据集(PDB中目前有36000条链)。
PDB注释和示例软件实现可在网站http://pmr.sdsc.edu/go上免费获取。