Wilke Andreas, Bischof Jared, Harrison Travis, Brettin Tom, D'Souza Mark, Gerlach Wolfgang, Matthews Hunter, Paczian Tobias, Wilkening Jared, Glass Elizabeth M, Desai Narayan, Meyer Folker
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, Illinois, United States of America; Computation Institute, University of Chicago, Chicago, Illinois, United States of America.
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, Illinois, United States of America.
PLoS Comput Biol. 2015 Jan 8;11(1):e1004008. doi: 10.1371/journal.pcbi.1004008. eCollection 2015 Jan.
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.
近年来,宏基因组测序产生了大量数据。例如,截至2013年夏季,MG-RAST已被用于注释超过110,000个数据集,总计超过43万亿字节。随着宏基因组测序在科学界得到更广泛的应用,MG-RAST中现有的基于网络的分析工具和基础设施在数据检索和分析方面的能力有限,例如多个数据集之间的比较分析。此外,尽管该系统提供了许多分析工具,但并不全面。通过经由网络服务应用程序编程接口(API)开放MG-RAST,我们极大地扩展了对MG-RAST数据的访问,并提供了一种使用第三方分析工具处理MG-RAST数据的机制。这个RESTful API使MG-RAST管道创建的所有数据和数据对象都可以作为JSON对象访问。作为美国能源部系统生物学知识库项目(KBase,http://kbase.us)的一部分,我们为MG-RAST实现了一个网络服务API。这个API补充了现有的MG-RAST网络界面,并构成了KBase微生物群落功能的基础。此外,该API向程序员公开了全面的数据集合。这个使用RESTful(资源表示状态转移)实现的API与大多数编程环境兼容,并且对于最终用户和第三方来说应该易于使用。它提供了对序列数据、质量控制结果、注释以及许多其他数据类型的全面访问。在可行的情况下,我们使用标准来公开数据和元数据。提供了多种语言的代码示例,以展示API的通用性并为用户提供一个起点。我们展示了一个API,它公开了MG-RAST中的数据以供用户使用,极大地提高了MG-RAST服务的实用性。