Mulder Nicola J, Apweiler Rolf, Attwood Terri K, Bairoch Amos, Bateman Alex, Binns David, Biswas Margaret, Bradley Paul, Bork Peer, Bucher Phillip, Copley Richard, Courcelle Emmanuel, Durbin Richard, Falquet Laurent, Fleischmann Wolfgang, Gouzy Jerome, Griffith-Jones Sam, Haft Daniel, Hermjakob Henning, Hulo Nicolas, Kahn Daniel, Kanapin Alexander, Krestyaninova Maria, Lopez Rodrigo, Letunic Ivica, Orchard Sandra, Pagni Marco, Peyruc David, Ponting Chris P, Servant Florence, Sigrist Christian J A
EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge, UK.
Brief Bioinform. 2002 Sep;3(3):225-35. doi: 10.1093/bib/3.3.225.
The exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects. The member databases - PRINTS, PROSITE, Pfam, ProDom, SMART and TIGRFAMs - form the InterPro core. Related signatures from each member database are unified into single InterPro entries. Each InterPro entry includes a unique accession number, functional descriptions and literature references, and links are made back to the relevant member database(s). Release 4.0 of InterPro (November 2001) contains 4,691 entries, representing 3,532 families, 1,068 domains, 74 repeats and 15 sites of post-translational modification (PTMs) encoded by different regular expressions, profiles, fingerprints and hidden Markov models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (2,141,621 InterPro hits from 586,124 SWISS-PROT and TrEMBL protein sequences). The database is freely accessible for text- and sequence-based searches.
基因组测序中心向核苷酸序列数据库提交的核苷酸序列呈指数级增长,因此需要快速、自动的方法对由此产生的蛋白质序列进行分类。有几种基于特征和序列簇的蛋白质分类方法,由于基础分析方法的差异,每种资源都有不同的最佳应用领域。认识到这一点后,InterPro被开发为蛋白质家族、结构域和功能位点的综合文献资源,以整合各个蛋白质特征数据库项目的互补工作。成员数据库——PRINTS、PROSITE、Pfam、ProDom、SMART和TIGRFAMs——构成了InterPro的核心。来自每个成员数据库的相关特征被统一到单个InterPro条目中。每个InterPro条目都包括一个唯一的登录号、功能描述和文献参考,并链接回相关的成员数据库。InterPro 4.0版本(2001年11月)包含4691个条目,代表3532个家族、1068个结构域、74个重复序列和15个由不同正则表达式、轮廓、指纹和隐马尔可夫模型(HMM)编码的翻译后修饰(PTM)位点。每个InterPro条目列出了与SWISS-PROT和TrEMBL的所有匹配项(来自586,124个SWISS-PROT和TrEMBL蛋白质序列的2,141,621个InterPro匹配项)。该数据库可免费用于基于文本和序列的搜索。