Rappoport Nadav, Linial Michal
School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Givat Ram Campus, Jerusalem, 91904 Israel.
School of Computer Science and Engineering and Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Givat Ram Campus, Jerusalem, 91904 Israel
Database (Oxford). 2015 Apr 24;2015:bau122. doi: 10.1093/database/bau122. Print 2015.
ProtoBug (http://www.protobug.cs.huji.ac.il) is a database and resource of protein families in Arthropod genomes. ProtoBug platform presents the relatedness of complete proteomes from 17 insects as well as a proteome of the crustacean, Daphnia pulex. The represented proteomes from insects include louse, bee, beetle, ants, flies and mosquitoes. Based on an unsupervised clustering method, protein sequences were clustered into a hierarchical tree, called ProtoBug. ProtoBug covers about 300,000 sequences that are partitioned to families. At the default setting, all sequences are partitioned to ∼20,000 families (excluding singletons). From the species perspective, each of the 18 analysed proteomes is composed of 5000-8000 families. In the regime of the advanced operational mode, the ProtoBug provides rich navigation capabilities for touring the hierarchy of the families at any selected resolution. A proteome viewer shows the composition of sequences from any of the 18 analysed proteomes. Using functional annotation from an expert system (Pfam) we assigned domains, families and repeats by 4400 keywords that cover 73% of the sequences. A strict inference protocol is applied for expanding the functional knowledge. Consequently, secured annotations were associated with 81% of the proteins, and with 70% of the families (≥10 proteins each). ProtoBug is a database and webtool with rich visualization and navigation tools. The properties of each family in relation to other families in the ProtoBug tree, and in view of the taxonomy composition are reported. Furthermore, the user can paste its own sequences to find relatedness to any of the ProtoBug families. The database and the navigation tools are the basis for functional discoveries that span 350 million years of evolution of Arthropods. ProtoBug is available with no restriction at: www.protobug.cs.huji.ac.il. Database URL: www.protobug.cs.huji.ac.il
ProtoBug(http://www.protobug.cs.huji.ac.il)是一个关于节肢动物基因组中蛋白质家族的数据库和资源库。ProtoBug平台展示了17种昆虫的完整蛋白质组以及一种甲壳纲动物——水蚤的蛋白质组之间的相关性。所展示的昆虫蛋白质组包括虱子、蜜蜂、甲虫、蚂蚁、苍蝇和蚊子。基于一种无监督聚类方法,蛋白质序列被聚类成一棵层次树,称为ProtoBug。ProtoBug涵盖约300,000个序列,这些序列被划分到各个家族中。在默认设置下,所有序列被划分到约20,000个家族(不包括单例)。从物种角度来看,18个被分析的蛋白质组中的每一个都由5000 - 8000个家族组成。在高级操作模式下,ProtoBug提供了丰富的导航功能,可用于以任何选定的分辨率浏览家族层次结构。一个蛋白质组查看器展示了18个被分析蛋白质组中任何一个的序列组成。利用来自专家系统(Pfam)的功能注释,我们通过4400个关键词为73%的序列分配了结构域、家族和重复序列。应用严格的推理协议来扩展功能知识。因此,81%的蛋白质以及70%的家族(每个家族至少10个蛋白质)都有可靠的注释。ProtoBug是一个拥有丰富可视化和导航工具的数据库及网络工具。报告了ProtoBug树中每个家族相对于其他家族的属性以及分类组成情况。此外,用户可以粘贴自己的序列来查找与ProtoBug中任何家族的相关性。该数据库和导航工具是跨越3.5亿年节肢动物进化历程进行功能发现的基础。ProtoBug可在以下网址免费获取:www.protobug.cs.huji.ac.il。数据库网址:www.protobug.cs.huji.ac.il