• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EvoAug-TF:将基于进化的基因组深度学习数据增强扩展到 TensorFlow。

EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, United States.

Commack High School, Commack, NY 11725, United States.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae092.

DOI:10.1093/bioinformatics/btae092
PMID:38366935
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10918628/
Abstract

SUMMARY

Deep neural networks (DNNs) have been widely applied to predict the molecular functions of the non-coding genome. DNNs are data hungry and thus require many training examples to fit data well. However, functional genomics experiments typically generate limited amounts of data, constrained by the activity levels of the molecular function under study inside the cell. Recently, EvoAug was introduced to train a genomic DNN with evolution-inspired augmentations. EvoAug-trained DNNs have demonstrated improved generalization and interpretability with attribution analysis. However, EvoAug only supports PyTorch-based models, which limits its applications to a broad class of genomic DNNs based in TensorFlow. Here, we extend EvoAug's functionality to TensorFlow in a new package, we call EvoAug-TF. Through a systematic benchmark, we find that EvoAug-TF yields comparable performance with the original EvoAug package.

AVAILABILITY AND IMPLEMENTATION

EvoAug-TF is freely available for users and is distributed under an open-source MIT license. Researchers can access the open-source code on GitHub (https://github.com/p-koo/evoaug-tf). The pre-compiled package is provided via PyPI (https://pypi.org/project/evoaug-tf) with in-depth documentation on ReadTheDocs (https://evoaug-tf.readthedocs.io). The scripts for reproducing the results are available at (https://github.com/p-koo/evoaug-tf_analysis).

摘要

摘要

深度神经网络 (DNN) 已被广泛应用于预测非编码基因组的分子功能。DNN 对数据的需求量很大,因此需要大量的训练样本来很好地拟合数据。然而,功能基因组学实验通常只能生成有限数量的数据,这受到细胞内所研究分子功能的活性水平的限制。最近,引入了 EvoAug 来使用受进化启发的增强功能训练基因组 DNN。EvoAug 训练的 DNN 通过归因分析显示出了改进的泛化能力和可解释性。然而,EvoAug 仅支持基于 PyTorch 的模型,这限制了其在广泛的基于 TensorFlow 的基因组 DNN 中的应用。在这里,我们在一个新的包中扩展了 EvoAug 的功能,我们称之为 EvoAug-TF。通过系统的基准测试,我们发现 EvoAug-TF 与原始的 EvoAug 包具有相当的性能。

可用性和实现

EvoAug-TF 可供用户免费使用,并以开源 MIT 许可证分发。研究人员可以在 GitHub(https://github.com/p-koo/evoaug-tf)上访问开源代码。预编译的包通过 PyPI(https://pypi.org/project/evoaug-tf)提供,并在 ReadTheDocs(https://evoaug-tf.readthedocs.io)上提供详细的文档。重现结果的脚本可在(https://github.com/p-koo/evoaug-tf_analysis)获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20be/10918628/a9823574bdac/btae092f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20be/10918628/a9823574bdac/btae092f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20be/10918628/a9823574bdac/btae092f1.jpg

相似文献

1
EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.EvoAug-TF:将基于进化的基因组深度学习数据增强扩展到 TensorFlow。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae092.
2
EvoAug-TF: Extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.EvoAug-TF:将受进化启发的基因组深度学习数据增强扩展到TensorFlow。
bioRxiv. 2024 Jan 18:2024.01.17.575961. doi: 10.1101/2024.01.17.575961.
3
EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.EvoAug:利用受进化启发的数据增强方法提高基因组深度学习神经网络的泛化能力和可解释性。
Genome Biol. 2023 May 5;24(1):105. doi: 10.1186/s13059-023-02941-w.
4
Goldilocks: a tool for identifying genomic regions that are 'just right'.金发姑娘:一种用于识别“恰到好处”的基因组区域的工具。
Bioinformatics. 2016 Jul 1;32(13):2047-9. doi: 10.1093/bioinformatics/btw116. Epub 2016 Mar 7.
5
keras_dna: a wrapper for fast implementation of deep learning models in genomics.keras_dna:用于在基因组学中快速实现深度学习模型的包装器。
Bioinformatics. 2021 Jul 12;37(11):1593-1594. doi: 10.1093/bioinformatics/btaa929.
6
PyHMMER: a Python library binding to HMMER for efficient sequence analysis.PyHMMER:一个绑定到 HMMER 的 Python 库,用于高效的序列分析。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad214.
7
Medusa: Software to build and analyze ensembles of genome-scale metabolic network reconstructions.美杜莎:用于构建和分析基因组规模代谢网络重建集合的软件。
PLoS Comput Biol. 2020 Apr 29;16(4):e1007847. doi: 10.1371/journal.pcbi.1007847. eCollection 2020 Apr.
8
Scbean: a python library for single-cell multi-omics data analysis.Scbean:一个用于单细胞多组学数据分析的 Python 库。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae053.
9
pyInfinityFlow: optimized imputation and analysis of high-dimensional flow cytometry data for millions of cells.pyInfinityFlow:用于对数百万个细胞的高维流式细胞术数据进行优化推断和分析。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad287.
10
Efficient population-scale variant analysis and prioritization with VAPr.利用 VAPr 进行高效的群体规模变异分析和优先级排序。
Bioinformatics. 2018 Aug 15;34(16):2843-2845. doi: 10.1093/bioinformatics/bty192.

引用本文的文献

1
Uncertainty-aware genomic deep learning with knowledge distillation.基于知识蒸馏的不确定性感知基因组深度学习
bioRxiv. 2024 Nov 15:2024.11.13.623485. doi: 10.1101/2024.11.13.623485.
2
Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction.半监督学习结合伪标签在调控序列预测方面优于大型语言模型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae560.

本文引用的文献

1
Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation.利用系统发育增强提高监管基因组学中监督深度学习的性能。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae190.
2
Evaluating deep learning for predicting epigenomic profiles.评估用于预测表观基因组图谱的深度学习。
Nat Mach Intell. 2022 Dec;4(12):1088-1100. doi: 10.1038/s42256-022-00570-9. Epub 2022 Dec 5.
3
Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.
选择能够对基因组学产生基于归因的一致解释的深度神经网络。
Proc Mach Learn Res. 2022 Nov;200:131-149.
4
Correcting gradient-based interpretations of deep neural networks for genomics.纠正基于梯度的深度学习神经网络在基因组学中的解释。
Genome Biol. 2023 May 9;24(1):109. doi: 10.1186/s13059-023-02956-3.
5
EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.EvoAug:利用受进化启发的数据增强方法提高基因组深度学习神经网络的泛化能力和可解释性。
Genome Biol. 2023 May 5;24(1):105. doi: 10.1186/s13059-023-02941-w.
6
Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.利用进化进行对比学习来发现无序区域的分子特征。
PLoS Comput Biol. 2022 Jun 29;18(6):e1010238. doi: 10.1371/journal.pcbi.1010238. eCollection 2022 Jun.
7
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers.DeepSTARR 可根据 DNA 序列预测增强子活性,并能够从头设计合成增强子。
Nat Genet. 2022 May;54(5):613-624. doi: 10.1038/s41588-022-01048-5. Epub 2022 May 12.
8
Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用,从序列中有效预测基因表达。
Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.
9
Base-resolution models of transcription-factor binding reveal soft motif syntax.基于分辨率的转录因子结合模型揭示了软基序语法。
Nat Genet. 2021 Mar;53(3):354-366. doi: 10.1038/s41588-021-00782-6. Epub 2021 Feb 18.
10
Deep learning for inferring transcription factor binding sites.用于推断转录因子结合位点的深度学习
Curr Opin Syst Biol. 2020 Feb;19:16-23. doi: 10.1016/j.coisb.2020.04.001. Epub 2020 Jun 11.