Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, Los Angeles, CA 90095, USA.
BMC Bioinformatics. 2011 Jul 26;12:304. doi: 10.1186/1471-2105-12-304.
Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.
This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.
The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators--experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.
当代信息学和基因组学研究需要高效、灵活和强大的管理大型异构数据、先进的计算工具、强大的可视化、可靠的硬件基础设施、计算资源的互操作性,以及详细的数据和分析协议来源。管道是一个客户端-服务器分布式计算环境,它促进了高级数据分析协议的可视化图形构建、执行、监控、验证和传播。
本文报告了 LONI 管道环境在解决两个信息学挑战方面的应用 - 多样化基因组学工具的图形管理,以及信息学软件的互操作性。具体来说,本文介绍了将一般信息学套件和单个软件工具部署到新硬件基础设施、通过管道图形界面设计、验证和执行新的可视化分析协议,以及通过管道可扩展标记语言语法集成多样化信息学工具的具体细节。我们使用了几个已建立的信息学软件包(例如,miBLAST、EMBOSS、mrFAST、GWASS、MAQ、SAMtools、Bowtie)来演示这些过程,这些软件包用于基本的本地序列比对和搜索、分子生物学数据分析以及全基因组关联研究。这些示例演示了管道图形工作流环境的强大功能,它能够集成生物信息学资源,为动态指定输入/输出参数和运行时执行控制提供了明确定义的语法。
LONI 管道环境 http://pipeline.loni.ucla.edu 为高效的生物医学计算和分布式信息学研究提供了灵活的图形基础设施。交互式管道资源管理器使各种类型的信息学资源能够实现利用和互操作性。管道客户端-服务器模型为广泛的信息学研究人员提供了计算能力,包括有经验的开发人员和新手用户、有或没有访问高级计算资源(例如,网格、数据)的用户,以及基础和转化科学家。开放开发、验证和传播计算网络(管道工作流)有助于知识、工具、协议和最佳实践的共享,并使整个社区能够对科学发现进行无偏见的验证和复制。