Suppr超能文献

Galaxy 中使用深度学习的工具推荐系统。

Tool recommender system in Galaxy using deep learning.

机构信息

Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.

Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany.

出版信息

Gigascience. 2021 Jan 6;10(1). doi: 10.1093/gigascience/giaa152.

Abstract

BACKGROUND

Galaxy is a web-based and open-source scientific data-processing platform. Researchers compose pipelines in Galaxy to analyse scientific data. These pipelines, also known as workflows, can be complex and difficult to create from thousands of tools, especially for researchers new to Galaxy. To help researchers with creating workflows, a system is developed to recommend tools that can facilitate further data analysis.

FINDINGS

A model is developed to recommend tools using a deep learning approach by analysing workflows composed by researchers on the European Galaxy server. The higher-order dependencies in workflows, represented as directed acyclic graphs, are learned by training a gated recurrent units neural network, a variant of a recurrent neural network. In the neural network training, the weights of tools used are derived from their usage frequencies over time and the sequences of tools are uniformly sampled from training data. Hyperparameters of the neural network are optimized using Bayesian optimization. Mean accuracy of 98% in recommending tools is achieved for the top-1 metric.

CONCLUSIONS

The model is accessed by a Galaxy API to provide researchers with recommended tools in an interactive manner using multiple user interface integrations on the European Galaxy server. High-quality and highly used tools are shown at the top of the recommendations. The scripts and data to create the recommendation system are available under MIT license at https://github.com/anuprulez/galaxy_tool_recommendation.

摘要

背景

Galaxy 是一个基于网络的开源科学数据处理平台。研究人员在 Galaxy 中编写流程来分析科学数据。这些流程,也称为工作流,可能非常复杂,并且很难从数千个工具中创建,尤其是对于新接触 Galaxy 的研究人员而言。为了帮助研究人员创建工作流,开发了一种系统,通过分析欧洲 Galaxy 服务器上的研究人员编写的工作流,来推荐可以促进进一步数据分析的工具。

发现

通过分析在欧洲 Galaxy 服务器上编写的工作流,使用深度学习方法开发了一种模型来推荐工具。通过训练门控循环单元神经网络(循环神经网络的一种变体)来学习工作流中的高阶依赖关系,该网络表示为有向无环图。在神经网络训练中,工具的权重是从其随时间的使用频率和从训练数据中均匀采样的工具序列中得出的。使用贝叶斯优化来优化神经网络的超参数。在 top-1 指标中,推荐工具的平均准确率达到 98%。

结论

该模型通过 Galaxy API 访问,通过欧洲 Galaxy 服务器上的多个用户界面集成以交互方式为研究人员提供推荐工具。高质量和高使用率的工具会显示在推荐的顶部。创建推荐系统的脚本和数据可在 MIT 许可证下在 https://github.com/anuprulez/galaxy_tool_recommendation 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a70a/7786169/299302457294/giaa152fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验