Gan Zhuohui, Stowe Jennifer C, Altintas Ilkay, McCulloch Andrew D, Zambon Alexander C
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, USA.
Procedia Comput Sci. 2014;29:2162-2167. doi: 10.1016/j.procs.2014.05.201.
Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.
越来越多的基因组技术正在产生海量的基因组数据,所有这些数据都需要进行复杂的分析。科学家们开发了越来越多的生物信息学分析工具来简化这些分析。然而,不同的流程是使用不同的软件环境开发的。这使得整合这些不同的生物信息学工具变得困难。开普勒提供了一个开源环境来整合这些不同的软件包。我们使用开普勒整合了几个外部工具,包括生物导体软件包、AltAnalyze(一个基于Python的开源工具)和基于R的比较工具,以构建一个自动化工作流程,对在线和本地微阵列数据进行元分析。这个自动化工作流程无缝连接了整合的工具,在工具之间顺畅地传递数据流,从而提高了复杂数据分析的效率和准确性。我们的工作流程例证了开普勒作为生物信息学流程的科学工作流程平台的用法。