使用 EUGENe 进行调控序列的预测分析。

Predictive analyses of regulatory sequences with EUGENe.

机构信息

Department of Medicine, University of California San Diego, La Jolla, CA, USA.

Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.

出版信息

Nat Comput Sci. 2023 Nov;3(11):946-956. doi: 10.1038/s43588-023-00544-w. Epub 2023 Nov 16.

DOI:10.1038/s43588-023-00544-w

PMID:38177592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10768637/

Abstract

Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.

摘要

深度学习已成为研究顺式调控功能的流行工具。然而，在设计用于监管基因组学的深度学习分析的软件方面，努力实现可查找、可访问、互操作和可重复使用（FAIR）的目标，尚未完全满足这些标准。在这里，我们提出了使用神经网络阐明基因组元件的效用（EUGENe），这是一个用于使用深度学习分析基因组序列的 FAIR 工具包。EUGENe 由一组模块和子包组成，用于执行基因组学深度学习工作流程的关键功能：（1）从许多常见文件格式中提取、转换和加载序列数据；（2）实例化、初始化和训练不同的模型架构；（3）评估和解释模型行为。我们将 EUGENe 设计为一个简单、灵活和可扩展的接口，用于简化和定制端到端的深度学习序列分析，并通过将工具包应用于三个预测建模任务来说明这些原则。我们希望 EUGENe 成为基因组学研究中深度学习应用的协作生态系统的起点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/503b/10768637/8e01ec0abd73/43588_2023_544_Fig1_HTML.jpg

相似文献

Predictive analyses of regulatory sequences with EUGENe.

Nat Comput Sci. 2023 Nov;3(11):946-956. doi: 10.1038/s43588-023-00544-w. Epub 2023 Nov 16.

A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study.

J Med Internet Res. 2023 Mar 8;25:e42822. doi: 10.2196/42822.

FAIR data retrieval for sensitive clinical research data in Galaxy.

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giad099.

SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines.

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz044.

Deep learning for genomics using Janggu.

Nat Commun. 2020 Jul 13;11(1):3488. doi: 10.1038/s41467-020-17155-y.

Enabling precision medicine via standard communication of HTS provenance, analysis, and results.

PLoS Biol. 2018 Dec 31;16(12):e3000099. doi: 10.1371/journal.pbio.3000099. eCollection 2018 Dec.

JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping.

FEBS Open Bio. 2021 Sep;11(9):2441-2452. doi: 10.1002/2211-5463.13261. Epub 2021 Aug 11.

EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes.

Bioinformatics. 2014 Sep 15;30(18):2659-61. doi: 10.1093/bioinformatics/btu366. Epub 2014 May 30.

Custom Biomedical FAIR Data Analysis in the Cloud Using CAVATICA.

medRxiv. 2024 Jun 28:2024.06.27.24309340. doi: 10.1101/2024.06.27.24309340.

Recommendations for the FAIRification of genomic track metadata.

F1000Res. 2021 Apr 1;10. doi: 10.12688/f1000research.28449.1. eCollection 2021.

引用本文的文献

Analysis-ready VCF at Biobank scale using Zarr.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf049.

AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare.

NAR Genom Bioinform. 2025 May 5;7(2):lqaf038. doi: 10.1093/nargab/lqaf038. eCollection 2025 Jun.

AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae593.

Enhancers in Plant Development, Adaptation and Evolution.

Plant Cell Physiol. 2025 May 17;66(4):461-476. doi: 10.1093/pcp/pcae121.

Decoding biology with massively parallel reporter assays and machine learning.

Genes Dev. 2024 Oct 16;38(17-20):843-865. doi: 10.1101/gad.351800.124.

Analysis-ready VCF at Biobank scale using Zarr.

bioRxiv. 2025 Feb 6:2024.06.11.598241. doi: 10.1101/2024.06.11.598241.

and Maize Terminator Strength is Determined by GC Content, Polyadenylation Motifs and Cleavage Probability.

bioRxiv. 2024 Jan 8:2023.06.16.545379. doi: 10.1101/2023.06.16.545379.

本文引用的文献

SpatialData: an open and universal data framework for spatial omics.

Nat Methods. 2025 Jan;22(1):58-62. doi: 10.1038/s41592-024-02212-x. Epub 2024 Mar 20.

GraphPart: homology partitioning for biological sequence analysis.

NAR Genom Bioinform. 2023 Oct 16;5(4):lqad088. doi: 10.1093/nargab/lqad088. eCollection 2023 Dec.

Epiphany: predicting Hi-C contact maps from 1D epigenomic signals.

Genome Biol. 2023 Jun 6;24(1):134. doi: 10.1186/s13059-023-02934-9.

Correcting gradient-based interpretations of deep neural networks for genomics.

Genome Biol. 2023 May 9;24(1):109. doi: 10.1186/s13059-023-02956-3.

Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening.

Nat Biotechnol. 2023 Aug;41(8):1140-1150. doi: 10.1038/s41587-022-01612-8. Epub 2023 Jan 9.

Introducing the FAIR Principles for research software.

Sci Data. 2022 Oct 14;9(1):622. doi: 10.1038/s41597-022-01710-x.

Obtaining genetics insights from deep learning via explainable artificial intelligence.

Nat Rev Genet. 2023 Feb;24(2):125-137. doi: 10.1038/s41576-022-00532-2. Epub 2022 Oct 3.

scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks.

Nat Methods. 2022 Sep;19(9):1088-1096. doi: 10.1038/s41592-022-01562-8. Epub 2022 Aug 8.

A sequence-based global map of regulatory activity for deciphering human genetics.

Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.

Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale.

Nat Genet. 2022 May;54(5):725-734. doi: 10.1038/s41588-022-01065-4. Epub 2022 May 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 EUGENe 进行调控序列的预测分析。

Predictive analyses of regulatory sequences with EUGENe.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献