• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

带注释的微生物转录组数据有助于并行挖掘和高通量重新分析以形成数据驱动的假设。

GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses.

作者信息

Li Zhongyou, Koeppen Katja, Holden Victoria I, Neff Samuel L, Cengher Liviu, Demers Elora G, Mould Dallas L, Stanton Bruce A, Hampton Thomas H

机构信息

Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.

Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA

出版信息

mSystems. 2021 Mar 23;6(2):e01305-20. doi: 10.1128/mSystems.01305-20.

DOI:10.1128/mSystems.01305-20
PMID:33758032
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8547006/
Abstract

The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated and transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated studies, three times more than previously available. GAPE analysis revealed that , a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated and transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on and compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species.

摘要

美国国家生物技术信息中心基因表达综合数据库(GEO)提供了查询和下载转录组数据的工具。然而,不到4%的微生物实验包含评估差异基因表达所需的样本组注释以便进行高通量重新分析,并且2014年之后存入的数据普遍缺乏这些注释。我们的算法GAUGE(使用文本/数据组集合进行通用注释)能自动注释GEO微生物数据集,包括微阵列和RNA测序研究,使适合分析的数据集比例从4%提高到33%。89%经GAUGE注释的研究与人工编目生成的组分配相匹配。为了证明GAUGE注释如何能带来科学见解,我们创建了GAPE(用于重新分析的GAUGE注释和转录组纲要),这是一个闪亮的网络界面,用于分析73项经GAUGE注释的研究,比之前可用的研究数量多两倍。GAPE分析显示,一个功能未知的基因在超过50%的研究中经常差异表达,并且与参与生物膜形成的基因显著共调控。后续的湿实验室实验表明,该基因的突变体在生物膜形成方面确实存在缺陷,这与GAUGE和GAPE促成的预测一致。我们预计,我们已免费提供的GAUGE和GAPE将使公开可用的微生物转录组数据更易于重新使用,并催生新的数据驱动假设。GEO存档了来自5800多个微生物实验的转录组数据,并允许研究人员回答已发表论文中未直接涉及的问题。然而,不到4%的微生物数据集包含高通量重新分析所需的样本组注释。这一限制阻碍了大量微生物转录组数据的轻松重新使用。在这里,我们证明GAUGE算法可以使33%的微生物数据可用于并行挖掘和重新分析。GAUGE注释提高了统计效力,从而使差异基因表达的一致模式更容易识别。此外,我们开发了GAPE(用于重新分析的GAUGE注释和转录组纲要),这是一个闪亮的网络界面,可对纲要进行并行分析。GAUGE和GAPE的源代码可免费获取,并且可以重新用于创建其他细菌物种的纲要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/aac61e1d6bde/msystems.01305-20_f006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/f399cc601788/msystems.01305-20_f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/89e0723c17ac/msystems.01305-20_f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/e5d521688c11/msystems.01305-20_f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/e6d14e46cd9c/msystems.01305-20_f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/18543d32f938/msystems.01305-20_f005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/aac61e1d6bde/msystems.01305-20_f006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/f399cc601788/msystems.01305-20_f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/89e0723c17ac/msystems.01305-20_f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/e5d521688c11/msystems.01305-20_f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/e6d14e46cd9c/msystems.01305-20_f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/18543d32f938/msystems.01305-20_f005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9338/8547006/aac61e1d6bde/msystems.01305-20_f006.jpg

相似文献

1
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses.带注释的微生物转录组数据有助于并行挖掘和高通量重新分析以形成数据驱动的假设。
mSystems. 2021 Mar 23;6(2):e01305-20. doi: 10.1128/mSystems.01305-20.
2
NCBI GEO: mining tens of millions of expression profiles--database and tools update.NCBI基因表达综合数据库:挖掘数千万个表达谱——数据库与工具更新
Nucleic Acids Res. 2007 Jan;35(Database issue):D760-5. doi: 10.1093/nar/gkl887. Epub 2006 Nov 11.
3
Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia.高效组装铜绿假单胞菌基因表达文库。
mSystems. 2023 Feb 23;8(1):e0034122. doi: 10.1128/msystems.00341-22. Epub 2022 Dec 21.
4
ADAGE-Based Integration of Publicly Available Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.基于ADAGE的公开可用基因表达数据与去噪自动编码器的整合揭示了微生物与宿主的相互作用。
mSystems. 2016 Jan 19;1(1). doi: 10.1128/mSystems.00025-15. eCollection 2016 Jan-Feb.
5
GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions.GEO2Enrichr:用于从基因表达综合数据库(GEO)中提取基因集并分析其生物学功能的浏览器扩展程序和服务器应用程序。
Bioinformatics. 2015 Sep 15;31(18):3060-2. doi: 10.1093/bioinformatics/btv297. Epub 2015 May 13.
6
The Gene Expression Omnibus Database.基因表达综合数据库
Methods Mol Biol. 2016;1418:93-110. doi: 10.1007/978-1-4939-3578-9_5.
7
NCBI GEO: archive for high-throughput functional genomic data.NCBI基因表达综合数据库:高通量功能基因组数据存档库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D885-90. doi: 10.1093/nar/gkn764. Epub 2008 Oct 21.
8
Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest.基于功能感兴趣区域的公开可用 Affymetrix® GeneChip® 数据集再分析框架。
BMC Genomics. 2017 Dec 6;18(Suppl 10):875. doi: 10.1186/s12864-017-4266-5.
9
10
MycoBASE: expanding the functional annotation coverage of mycobacterial genomes.MycoBASE:扩大分枝杆菌基因组的功能注释覆盖范围。
BMC Genomics. 2015 Dec 24;16:1102. doi: 10.1186/s12864-015-2311-9.

引用本文的文献

1
E.PathDash, pathway activation analysis of publicly available pathogen gene expression data.E.PathDash,公开的病原体基因表达数据的途径激活分析。
mSystems. 2024 Nov 19;9(11):e0103024. doi: 10.1128/msystems.01030-24. Epub 2024 Oct 18.
2
transcriptome analysis of metal restriction in cystic fibrosis sputum.囊性纤维化痰液中金属限制的转录组分析。
Microbiol Spectr. 2024 Apr 2;12(4):e0315723. doi: 10.1128/spectrum.03157-23. Epub 2024 Feb 22.
3
Rocket-miR, a translational launchpad for miRNA-based antimicrobial drug development.

本文引用的文献

1
Pseudomonas aeruginosa uses multiple receptors for adherence to laminin during infection of the respiratory tract and skin wounds.铜绿假单胞菌在呼吸道感染和皮肤创伤感染过程中,利用多种受体与层粘连蛋白结合。
Sci Rep. 2019 Dec 3;9(1):18168. doi: 10.1038/s41598-019-54622-z.
2
Interaction with the host: the role of fibronectin and extracellular matrix proteins in the adhesion of Gram-negative bacteria.与宿主的相互作用:纤连蛋白和细胞外基质蛋白在革兰氏阴性菌黏附中的作用。
Med Microbiol Immunol. 2020 Jun;209(3):277-299. doi: 10.1007/s00430-019-00644-3. Epub 2019 Nov 29.
3
Deciphering the Ecology of Cystic Fibrosis Bacterial Communities: Towards Systems-Level Integration.
火箭-miR,一个基于 miRNA 的抗菌药物开发的翻译启动平台。
mSystems. 2023 Dec 21;8(6):e0065323. doi: 10.1128/msystems.00653-23. Epub 2023 Nov 17.
4
Analysis of transcription in an cystic fibrosis sputum model identifies metal restriction as a gene expression stimulus.在囊性纤维化痰液模型中对转录进行分析,确定金属限制是一种基因表达刺激因素。
bioRxiv. 2023 Aug 21:2023.08.21.554169. doi: 10.1101/2023.08.21.554169.
5
Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia.高效组装铜绿假单胞菌基因表达文库。
mSystems. 2023 Feb 23;8(1):e0034122. doi: 10.1128/msystems.00341-22. Epub 2022 Dec 21.
6
SOPHIE: Generative Neural Networks Separate Common and Specific Transcriptional Responses.索菲娅:生成式神经网络可分离常见和特定转录反应。
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):912-927. doi: 10.1016/j.gpb.2022.09.011. Epub 2022 Oct 7.
7
Using genome-wide expression compendia to study microorganisms.利用全基因组表达汇编研究微生物。
Comput Struct Biotechnol J. 2022 Aug 10;20:4315-4324. doi: 10.1016/j.csbj.2022.08.012. eCollection 2022.
8
CF-Seq, an accessible web application for rapid re-analysis of cystic fibrosis pathogen RNA sequencing studies.CF-Seq,一个易于使用的网络应用程序,可快速重新分析囊性纤维化病原体 RNA 测序研究。
Sci Data. 2022 Jun 16;9(1):343. doi: 10.1038/s41597-022-01431-1.
解析囊性纤维化细菌群落的生态:迈向系统水平的整合。
Trends Mol Med. 2019 Dec;25(12):1110-1122. doi: 10.1016/j.molmed.2019.07.008. Epub 2019 Aug 19.
4
Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus.通过在基因表达综合数据库中进行自由文本元数据挖掘发现干扰基因靶标。
Comput Biol Chem. 2019 Jun;80:152-158. doi: 10.1016/j.compbiolchem.2019.03.014. Epub 2019 Mar 24.
5
ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.ape 5.0:R 中的现代系统发育学和进化分析环境。
Bioinformatics. 2019 Feb 1;35(3):526-528. doi: 10.1093/bioinformatics/bty633.
6
GEOMetaCuration: a web-based application for accurate manual curation of Gene Expression Omnibus metadata.GEOMetaCuration:一个基于网络的应用程序,用于准确地手动整理基因表达综合数据集元数据。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay019.
7
ALE: automated label extraction from GEO metadata.ALE:从 GEO 元数据中自动提取标签。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):509. doi: 10.1186/s12859-017-1888-1.
8
ScanGEO: parallel mining of high-throughput gene expression data.ScanGEO:高通量基因表达数据的并行挖掘。
Bioinformatics. 2017 Nov 1;33(21):3500-3501. doi: 10.1093/bioinformatics/btx452.
9
ADAGE-Based Integration of Publicly Available Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.基于ADAGE的公开可用基因表达数据与去噪自动编码器的整合揭示了微生物与宿主的相互作用。
mSystems. 2016 Jan 19;1(1). doi: 10.1128/mSystems.00025-15. eCollection 2016 Jan-Feb.
10
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.