Stanislaus Romesh, Jiang Liu Hong, Swartz Martha, Arthur John, Almeida Jonas S
Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, South Carolina, USA.
BMC Bioinformatics. 2004 Jan 29;5:9. doi: 10.1186/1471-2105-5-9.
Many proteomics initiatives require a seamless bioinformatics integration of a range of analytical steps between sample collection and systems modeling immediately assessable to the participants involved in the process. Proteomics profiling by 2D gel electrophoresis to the putative identification of differentially expressed proteins by comparison of mass spectrometry results with reference databases, includes many components of sample processing, not just analysis and interpretation, are regularly revisited and updated. In order for such updates and dissemination of data, a suitable data structure is needed. However, there are no such data structures currently available for the storing of data for multiple gels generated through a single proteomic experiments in a single XML file. This paper proposes a data structure based on XML standards to fill the void that exists between data generated by proteomics experiments and storing of data.
In order to address the resulting procedural fluidity we have adopted and implemented a data model centered on the concept of annotated gel (AG) as the format for delivery and management of 2D Gel electrophoresis results. An eXtensible Markup Language (XML) schema is proposed to manage, analyze and disseminate annotated 2D Gel electrophoresis results. The structure of AG objects is formally represented using XML, resulting in the definition of the AGML syntax presented here.
The proposed schema accommodates data on the electrophoresis results as well as the mass-spectrometry analysis of selected gel spots. A web-based software library is being developed to handle data storage, analysis and graphic representation. Computational tools described will be made available at http://bioinformatics.musc.edu/agml. Our development of AGML provides a simple data structure for storing 2D gel electrophoresis data.
许多蛋白质组学计划要求在样本采集和系统建模之间,对一系列分析步骤进行无缝的生物信息学整合,以便参与该过程的人员能够立即进行评估。从二维凝胶电泳进行蛋白质组学分析,到通过将质谱结果与参考数据库进行比较来推定鉴定差异表达的蛋白质,这一过程包括样本处理的许多环节,不仅仅是分析和解释,并且需要定期回顾和更新。为了实现数据的这种更新和传播,需要一种合适的数据结构。然而,目前尚无这样的数据结构可用于在单个XML文件中存储通过单个蛋白质组学实验生成的多个凝胶的数据。本文提出了一种基于XML标准的数据结构,以填补蛋白质组学实验产生的数据与数据存储之间存在的空白。
为了解决由此产生的程序流动性问题,我们采用并实施了一个以注释凝胶(AG)概念为中心的数据模型,作为二维凝胶电泳结果的传递和管理格式。提出了一种可扩展标记语言(XML)模式,用于管理、分析和传播注释后的二维凝胶电泳结果。AG对象的结构使用XML进行形式化表示,从而产生了此处介绍的AGML语法的定义。
所提出的模式可容纳电泳结果以及所选凝胶斑点的质谱分析数据。正在开发一个基于网络的软件库来处理数据存储、分析和图形表示。所描述的计算工具将在http://bioinformatics.musc.edu/agml上提供。我们对AGML的开发为存储二维凝胶电泳数据提供了一种简单的数据结构。