Wilke Andreas, Glass Elizabeth M, Bartels Daniela, Bischof Jared, Braithwaite Daniel, D'Souza Mark, Gerlach Wolfgang, Harrison Travis, Keegan Kevin, Matthews Hunter, Kottmann Renzo, Paczian Tobias, Tang Wei, Trimble William L, Yilmaz Pelin, Wilkening Jared, Desai Narayan, Meyer Folker
Argonne National Laboratory, Lemont, Illinois, USA; University of Chicago, Chicago, Illinois, USA.
Methods Enzymol. 2013;531:487-523. doi: 10.1016/B978-0-12-407863-5.00022-8.
The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.
测序的普及化使得数据分析面临诸多挑战;MG-RAST为各种数据集(包括扩增子数据集、鸟枪法宏基因组和宏转录组)解决了其中许多挑战。从版本2到版本3的变化包括增加了使用FragGenescan的专用基因调用阶段、对预测蛋白质进行90%同一性的聚类,以及使用BLAT计算相似性。再加上底层软件基础设施的变化,这使得流水线吞吐量得以大幅提升,同时硬件预算仍保持在有限水平。基于网络的服务允许上传、全自动分析和结果可视化。由于测序成本的大幅下降以及MG-RAST易于获得的分析能力,已经分析了超过78,000个宏基因组数据集,其中超过12,000个在MG-RAST中公开可用。