Odronitz Florian, Kollmar Martin
Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Goettingen, Germany.
BMC Genomics. 2006 Nov 29;7:300. doi: 10.1186/1471-2164-7-300.
Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families.
Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content.
We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
真核生物蛋白质序列的注释对于理解其在细胞中的功能至关重要。目前,人工注释仍是正确预测基因最准确的方法。蛋白质序列的分类、它们的系统发育关系以及功能的分配涉及来自各种来源的信息。这往往导致收集到难以追踪的异构数据。细胞骨架蛋白和运动蛋白由庞大且多样的超家族组成,每个生物体中包含多达几十种成员。到目前为止,还没有可用的集成工具来协助进行蛋白质家族的大规模人工比较基因组分析。
Pfarao(用于检索、分析和组织的蛋白质家族应用程序)是一个由数据库驱动的在线工作环境,用于分析人工注释的蛋白质序列及其关系。目前,该系统可以存储和关联有关蛋白质序列、物种、系统发育关系和测序项目的广泛信息,以及与文献和结构域预测的链接。序列可以从注释过程中生成的多序列比对中导入。网络界面允许方便地浏览数据库并编制其内容的表格和图形摘要。
我们实现了一个以蛋白质序列为中心的网络应用程序,用于存储、组织、关联和呈现人工基因组注释和比较基因组学中生成的异构数据。该应用程序是为分析细胞骨架蛋白和运动蛋白(CyMoBase)而开发的,但可以很容易地适用于任何蛋白质。