European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
BMC Bioinformatics. 2010 May 11;11:240. doi: 10.1186/1471-2105-11-240.
The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.
We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.
eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Ensembl 项目每年会发布数次更新,以提供其比较基因组学资源。在每个发布周期中,大约需要两周的时间来生成所有的基因组比对和蛋白质同源性预测。这项任务所需的计算量大约随物种数量呈二次增长。我们目前在 Ensembl 中支持 50 个物种,预计未来这个数字还会继续增长。
我们提出了 eHive,这是一种新的容错分布式处理系统,最初是基于黑板系统、网络分布式自治代理、数据流图和块分支图来支持比较基因组分析的。在 eHive 系统中,MySQL 数据库充当中央黑板,而自主代理(一个 Perl 脚本)查询系统并根据需要运行作业。该系统允许我们定义数据流和分支规则,以适应我们所有的生产管道。我们描述了三个管道的实现:(1)成对的全基因组比对,(2)多个全基因组比对,(3)带有蛋白质同源性推断的基因树。最后,我们展示了该系统在实际场景中的效率。
eHive 使我们能够以可靠且高效的方式,以最小的监督和高吞吐量生成计算密集型的结果。更多文档请访问:http://www.ensembl.org/info/docs/eHive/。