Perez-Riverol Yasset, Uszkoreit Julian, Sanchez Aniel, Ternent Tobias, Del Toro Noemi, Hermjakob Henning, Vizcaíno Juan Antonio, Wang Rui
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany and.
Bioinformatics. 2015 Sep 1;31(17):2903-5. doi: 10.1093/bioinformatics/btv250. Epub 2015 Apr 24.
The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.
The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api.
Supplementary data are available at Bioinformatics online
ms-data-core-api是一个用于开发计算蛋白质组学工具和管道的免费开源库。该应用程序编程接口用Java编写,通过提供一个强大的、可插拔的编程接口和通用数据模型,实现快速创建工具。数据模型基于受控词汇表/本体,涵盖了常见蛋白质组学实验工作流程中包含的所有数据类型,从光谱到肽/蛋白质鉴定再到定量结果。该库包含用于三种最常用的蛋白质组学标准倡议标准文件格式的读取器:mzML、mzIdentML和mzTab。除了mzML,它还支持其他常见的质谱数据格式:dta、ms2、mgf、pkl、apl(基于文本)、mzXML和mzData(基于XML)。此外,它还可用于读取PRIDE XML,这是PRIDE数据库使用的原始格式,PRIDE数据库是世界领先的蛋白质组学资源之一。最后,我们展示了一组算法和工具,其实现说明了使用该库开发应用程序的简便性。
该软件可在https://github.com/PRIDE-Utilities/ms-data-core-api上免费获取。
补充数据可在《生物信息学》在线获取。