Department of Medicine, University of California San Diego, La Jolla, CA, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
Nat Comput Sci. 2023 Nov;3(11):946-956. doi: 10.1038/s43588-023-00544-w. Epub 2023 Nov 16.
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
深度学习已成为研究顺式调控功能的流行工具。然而,在设计用于监管基因组学的深度学习分析的软件方面,努力实现可查找、可访问、互操作和可重复使用(FAIR)的目标,尚未完全满足这些标准。在这里,我们提出了使用神经网络阐明基因组元件的效用(EUGENe),这是一个用于使用深度学习分析基因组序列的 FAIR 工具包。EUGENe 由一组模块和子包组成,用于执行基因组学深度学习工作流程的关键功能:(1)从许多常见文件格式中提取、转换和加载序列数据;(2)实例化、初始化和训练不同的模型架构;(3)评估和解释模型行为。我们将 EUGENe 设计为一个简单、灵活和可扩展的接口,用于简化和定制端到端的深度学习序列分析,并通过将工具包应用于三个预测建模任务来说明这些原则。我们希望 EUGENe 成为基因组学研究中深度学习应用的协作生态系统的起点。