Pedrioli Patrick G A, Eng Jimmy K, Hubley Robert, Vogelzang Mathijs, Deutsch Eric W, Raught Brian, Pratt Brian, Nilsson Erik, Angeletti Ruth H, Apweiler Rolf, Cheung Kei, Costello Catherine E, Hermjakob Henning, Huang Sequin, Julian Randall K, Kapp Eugene, McComb Mark E, Oliver Stephen G, Omenn Gilbert, Paton Norman W, Simpson Richard, Smith Richard, Taylor Chris F, Zhu Weimin, Aebersold Ruedi
Institute for Systems Biology, 1441 North 34 Street, Seattle, Washington 98103-8904, USA.
Nat Biotechnol. 2004 Nov;22(11):1459-66. doi: 10.1038/nbt1031.
A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.
在基于质谱(MS)的蛋白质组学研究中,使用了各种各样的质谱仪。每种仪器都有独特的设计、数据系统和性能规格,这导致不同类型的实验各有优缺点。不幸的是,每种质谱仪产生的原生二进制数据格式也不同,而且通常是专有的。数据结构多样、不透明的性质使得新仪器难以集成到现有的基础设施中,阻碍了不同实验和实验室结果的分析、交换、比较和发表,也使生物信息学界无法获取软件开发所需的数据集。在此,我们介绍“mzXML”格式,这是一种开放的、通用的XML(可扩展标记语言)形式的质谱数据表示。我们还开发了一套配套的支持程序。我们期望这种格式将促进蛋白质组学研究中的数据管理、解释和传播。