Brief Bioinform. 2019 Jul 19;20(4):1151-1159. doi: 10.1093/bib/bbx105.
As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.
随着技术的变化,MG-RAST 也在不断适应。新的可用软件正在被纳入,以提高准确性和性能。作为一个不断运行大量科学工作流程的计算服务,MG-RAST 是执行基准测试和实施算法或平台改进的理想场所,在许多情况下,这涉及到特异性、敏感性和运行时成本之间的权衡。[Glass EM、Dribinsky Y、Yilmaz P 等人,ISME J 2014;8:1-3]的工作就是一个例子;我们使用现有的、经过充分研究的数据集作为黄金标准,代表不同的环境和不同的技术,以评估管道的任何变化。目前,我们在 MG-RAST 中使用众所周知的数据集作为基准测试平台。使用人工数据集进行管道性能优化并没有增加价值,因为这些数据集不像真实数据集那样具有挑战性。此外,MG-RAST 团队欢迎对工作流程改进的建议。我们目前正在开发 4.02 和 4.1 两个版本,这两个版本都包含了来自社区和我们合作伙伴的重要意见,这将使双条形码、由长读技术支持的更强推断成为可能,并通过使用 Diamond 和 SortMeRNA 提高吞吐量,同时保持敏感性。在技术平台方面,MG-RAST 团队打算支持通用工作流语言作为指定生物信息学工作流程的标准,以促进社区数据分析任务的开发和高效的高性能实现。