Kasalica Vedran, Schwämmle Veit, Palmblad Magnus, Ison Jon, Lamprecht Anna-Lena
Department of Information and Computing Sciences, Utrecht University, Utrecht 3584 CC, The Netherlands.
Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense 5230, Denmark.
J Proteome Res. 2021 Apr 2;20(4):2157-2165. doi: 10.1021/acs.jproteome.0c00983. Epub 2021 Mar 15.
The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.
bio.tools注册库是生命科学领域计算工具的主要目录。国际生物信息学界已注册了17000多种工具。bio.tools元数据模式包括工具功能的语义注释,即使用EDAM生物信息本体中的术语对工具的数据类型、格式和操作进行形式化描述。这种注释能够将工具自动组合成多步骤管道或工作流程。在本技术说明中,我们重新审视了之前关于蛋白质组学工作流程自动组合的案例研究。我们使用相同的四个工作流程场景,但不是使用一小组经过精心手工注释的工具,而是直接在bio.tools上探索工作流程。我们使用了自动管道探索器(APE),它是之前使用的工作流程组合方法的重新实现和扩展。进入“真实环境”带来了前所未有的大量工具和大量替代工作流程。自动组合工具可用于系统地探索这个可能性空间。不可避免地,bio.tools中语义注释的质量参差不齐会导致意外或错误的工具组合。然而,我们的结果也表明,额外的控制机制(工具过滤器、配置选项和工作流程约束)可以有效地引导探索朝着更有意义的较小工作流程集发展。