Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
Bioinformatics. 2010 Jun 15;26(12):1488-92. doi: 10.1093/bioinformatics/btq167. Epub 2010 Apr 22.
The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users.
We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects.
Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net.
随着序列数据的增长,人们越来越需要在分布式计算机集群上分析数据。为了在常规分析中使用这些系统,需要可扩展且强大的软件来管理大型数据集的数据。还需要软件来简化数据管理,使大规模生物信息学分析能够为广泛的目标用户所访问和重现。
我们开发了一个名为 Ergatis 的工作流管理系统,使用户能够构建、执行和监控基因组数据分析的管道。Ergatis 包含了许多常见生物信息学任务的预配置组件和模板管道,例如原核基因组注释和基因组比较。这些组件的许多输出都可以加载到 Chado 关系数据库中。Ergatis 旨在为广泛的用户提供便利,并提供用户友好的基于 Web 的界面。Ergatis 支持在分布式计算集群上进行高吞吐量的批处理,并且已经在一些基因组注释和比较基因组学项目中用于数据管理。
Ergatis 是一个开源项目,可在 http://ergatis.sourceforge.net 上免费获得。