Suppr超能文献

iModulonMiner 和 PyModulon:用于非监督挖掘基因表达编目的软件。

iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.

机构信息

Department of Bioengineering, University of California, San Diego, La Jolla, California, United States of America.

Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California, United States of America.

出版信息

PLoS Comput Biol. 2024 Oct 23;20(10):e1012546. doi: 10.1371/journal.pcbi.1012546. eCollection 2024 Oct.

Abstract

Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.

摘要

公共基因表达数据库是一个快速扩展的生物体对各种干扰的反应资源,为生物信息学工作流程提供了提取转录调控网络功能的可操作知识的机会和挑战。在这里,我们介绍了一个名为 iModulonMiner 的五步计算管道,用于编译、处理、管理、分析和描述给定生物体或细胞类型的全部 RNA-seq 数据。该工作流程围绕着使用独立成分分析(称为 iModulons)对受调控的基因集进行数据驱动的计算,事实证明,iModulons 具有广泛的应用。作为演示,我们应用此工作流程生成了枯草芽孢杆菌的 iModulon 结构,使用了所有高质量、公开可用的 RNA-seq 数据。使用该结构,我们预测了多个转录因子的调控相互作用,鉴定了可能受未发现转录因子调控的共表达基因群,并预测了最近发现的单个亚基噬菌体 RNA 聚合酶的性质。我们还介绍了一个名为 PyModulon 的 Python 包,其中包含用于描述、可视化和探索计算出的 iModulons 的函数。该管道可在 https://github.com/SBRG/iModulonMiner 上获得,可轻松应用于不同的生物体,以快速了解其转录调控网络结构和特定条件下的活性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f3/11534266/6041bad8f2cb/pcbi.1012546.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验