Suppr超能文献

利用 Galaxy 中基于 Docker 的 JupyterLab 实现人工智能的可访问基础设施。

An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy.

机构信息

Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.

Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad028. Epub 2023 Apr 26.

Abstract

BACKGROUND

Artificial intelligence (AI) programs that train on large datasets require powerful compute infrastructure consisting of several CPU cores and GPUs. JupyterLab provides an excellent framework for developing AI programs, but it needs to be hosted on such an infrastructure to enable faster training of AI programs using parallel computing.

FINDINGS

An open-source, docker-based, and GPU-enabled JupyterLab infrastructure is developed that runs on the public compute infrastructure of Galaxy Europe consisting of thousands of CPU cores, many GPUs, and several petabytes of storage to rapidly prototype and develop end-to-end AI projects. Using a JupyterLab notebook, long-running AI model training programs can also be executed remotely to create trained models, represented in open neural network exchange (ONNX) format, and other output datasets in Galaxy. Other features include Git integration for version control, the option of creating and executing pipelines of notebooks, and multiple dashboards and packages for monitoring compute resources and visualization, respectively.

CONCLUSIONS

These features make JupyterLab in Galaxy Europe highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions in COVID-19 computed tomography scan images is reproduced using various features of JupyterLab on Galaxy Europe. In addition, ColabFold, a faster implementation of AlphaFold2, is accessed in JupyterLab to predict the 3-dimensional structure of protein sequences. JupyterLab is accessible in 2 ways-one as an interactive Galaxy tool and the other by running the underlying Docker container. In both ways, long-running training can be executed on Galaxy's compute infrastructure. Scripts to create the Docker container are available under MIT license at https://github.com/usegalaxy-eu/gpu-jupyterlab-docker.

摘要

背景

训练于大型数据集的人工智能 (AI) 程序需要由多个 CPU 内核和 GPU 组成的强大计算基础设施。JupyterLab 为开发 AI 程序提供了一个极好的框架,但它需要托管在这种基础设施上,才能利用并行计算来更快地训练 AI 程序。

发现

开发了一个基于开源、docker 和 GPU 的 JupyterLab 基础设施,该基础设施运行在 Galaxy Europe 的公共计算基础设施上,该基础设施由数千个 CPU 内核、许多 GPU 和几个 PB 的存储组成,用于快速原型制作和开发端到端 AI 项目。使用 JupyterLab 笔记本,可以远程执行长时间运行的 AI 模型训练程序,以在 Galaxy 中创建以开放神经网络交换 (ONNX) 格式表示的训练模型和其他输出数据集。其他功能包括用于版本控制的 Git 集成、创建和执行笔记本流水线的选项,以及用于分别监控计算资源和可视化的多个仪表板和包。

结论

这些功能使 Galaxy Europe 中的 JupyterLab 非常适合创建和管理 AI 项目。使用 Galaxy Europe 中的 JupyterLab 的各种功能再现了预测 COVID-19 计算机断层扫描图像中感染区域的一项最新科学出版物。此外,还在 JupyterLab 中访问了 ColabFold,这是 AlphaFold2 的更快实现,用于预测蛋白质序列的三维结构。JupyterLab 可以通过两种方式访问——一种是作为交互式 Galaxy 工具,另一种是运行底层 Docker 容器。在这两种方式中,都可以在 Galaxy 的计算基础设施上执行长时间运行的训练。用于创建 Docker 容器的脚本可在 https://github.com/usegalaxy-eu/gpu-jupyterlab-docker 下以 MIT 许可证获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476d/10132306/1c1579cb29f4/giad028fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验