Suppr超能文献

用于基准挑战的语义工作流:提高可比性、可重用性和可重复性。

Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility.

作者信息

Srivastava Arunima, Adusumilli Ravali, Boyce Hunter, Garijo Daniel, Ratnakar Varun, Mayani Rajiv, Yu Thomas, Machiraju Raghu, Gil Yolanda, Mallick Parag

机构信息

Computer Science and Engineering, The Ohio State University, 2015 Neil Ave Columbus, OH 43210, USA,

出版信息

Pac Symp Biocomput. 2019;24:208-219.

Abstract

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.

摘要

诸如蛋白质结构预测关键评估(CASP)和逆向工程评估与方法对话(DREAM)等基准挑战,对推动生物信息学方法的发展起到了重要作用。通常情况下,会发布挑战内容,然后竞争者基于盲测数据进行预测。之后,挑战者将答案提交至中央服务器进行评分。近期,通过一些系统实现了这些挑战的自动化,在这些系统中,挑战者提交Docker容器(一种将代码及其所有依赖项打包的软件单元)以便在云端运行。尽管它们在为生物信息学社区提供一个无偏倚的测试平台方面具有巨大价值,但仍有机会进一步提升基准挑战的潜在影响力。具体而言,当前方法仅评估端到端性能;几乎不可能直接比较方法或参数。此外,由于缺乏细节、工具和参数的模糊性以及共享和维护方面的问题,科学界难以轻松复用挑战者的方法。最后,由于未明确界定所提议的工作流程,无法捕捉使用特定步骤背后的直觉,这使得理解数据的流程和利用变得繁琐。在此,我们介绍一种基于WINGS语义工作流系统来克服这些限制的方法。具体来说,WINGS使研究人员能够提交完整的语义工作流作为挑战提交内容。通过将参赛作品作为工作流提交,不仅可以比较挑战者的结果和性能,还可以比较所采用的方法。当数十个挑战参赛作品可能使用几乎相同的工具,但参数仅有细微变化(且结果存在巨大差异)时,这一点尤为重要。WINGS采用组件驱动的工作流设计,并通过对数据特征进行推理提供智能参数和数据选择。在生物信息学工作流中,使用默认或错误的参数值容易大幅改变结果,这一点被证明尤为关键。通过使用抽象工作流,可以轻松比较不同的挑战参赛作品,这也便于复用。WINGS基于云计算设置,存储数据、依赖项和工作流以便于共享和使用。它还能够通过Pegasus工作流执行系统使用分布式计算来扩展工作流执行。我们展示了这种架构在DREAM蛋白质基因组学挑战中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed03/6417805/17b0a9d09432/nihms-999795-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验