重构 GEO：用于基因组动态分析的基因表达综合（GEO）元数据重构。

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA.

Universidad Antonio Nariño, Bogotá, Colombia.

出版信息

Database (Oxford). 2019 Jan 1;2019:bay145. doi: 10.1093/database/bay145.

DOI:10.1093/database/bay145

PMID:30649296

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6333964/

Abstract

MOTIVATION

Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse.

RESULTS

We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series' metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study's description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.

摘要

动机

基因表达综合数据库（GEO）和其他公开可用的数据以非结构化英文文本的形式存储其元数据，这使得自动化重用变得非常困难。

结果

我们采用文本挖掘技术来分析 GEO 的元数据，并开发了重构基因表达数据库（ReGEO）。ReGEO 对 GEO 系列进行了重新组织和分类，并通过从每个系列元数据中自动提取的两个新属性来对其进行搜索。这些属性是实验中测试的时间点数量和正在研究的疾病。ReGEO 还可以通过 GEO 中提供的其他属性来搜索系列，例如平台生物、实验类型、相关 PubMed ID 以及研究描述中的一般关键字。我们的方法大大扩展了 GEO 数据的可用性，为大数据研究时代提高大量公开可用数据的实用性提供了一种可信的方法。

相似文献

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.重构 GEO：用于基因组动态分析的基因表达综合（GEO）元数据重构。

Database (Oxford). 2019 Jan 1;2019:bay145. doi: 10.1093/database/bay145.

Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus.通过在基因表达综合数据库中进行自由文本元数据挖掘发现干扰基因靶标。

Comput Biol Chem. 2019 Jun;80:152-158. doi: 10.1016/j.compbiolchem.2019.03.014. Epub 2019 Mar 24.

Predicting structured metadata from unstructured metadata.从非结构化元数据预测结构化元数据。

Database (Oxford). 2016 Jan 1;2016. doi: 10.1093/database/baw080.

ALE: automated label extraction from GEO metadata.ALE：从 GEO 元数据中自动提取标签。

BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):509. doi: 10.1186/s12859-017-1888-1.

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).预测CEDAR中的生物医学元数据：基因表达综合数据库（GEO）研究

J Biomed Inform. 2017 Aug;72:132-139. doi: 10.1016/j.jbi.2017.06.017. Epub 2017 Jun 16.

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.利用文本挖掘技术改善文献与生物数据之间的联系：以 GEO、PDB 和 MEDLINE 为例的研究。

Database (Oxford). 2012 Jun 8;2012:bas026. doi: 10.1093/database/bas026. Print 2012.

The Gene Expression Omnibus Database.基因表达综合数据库

Methods Mol Biol. 2016;1418:93-110. doi: 10.1007/978-1-4939-3578-9_5.

GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA.GEOfetch：一个命令行工具，用于从 GEO 和 SRA 下载数据和标准化元数据。

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad069.

GESgnExt: Gene Expression Signature Extraction and Meta-Analysis on Gene Expression Omnibus.GESgnExt：基于基因表达综合数据库的基因表达特征提取和荟萃分析。

IEEE J Biomed Health Inform. 2020 Jan;24(1):311-318. doi: 10.1109/JBHI.2019.2896144. Epub 2019 Jan 30.

Mining data and metadata from the gene expression omnibus.从基因表达综合数据库挖掘数据和元数据。

Biophys Rev. 2019 Feb;11(1):103-110. doi: 10.1007/s12551-018-0490-8. Epub 2018 Dec 29.

引用本文的文献

CORESH: a gene signature-based search engine for public gene expression datasets.CORESH：一种基于基因特征的公共基因表达数据集搜索引擎。

Nucleic Acids Res. 2025 May 5. doi: 10.1093/nar/gkaf372.

Using semantic search to find publicly available gene-expression datasets.使用语义搜索来查找公开可用的基因表达数据集。

bioRxiv. 2025 Mar 15:2025.03.13.643153. doi: 10.1101/2025.03.13.643153.

RummaGEO: Automatic mining of human and mouse gene sets from GEO.RummaGEO：从基因表达综合数据库（GEO）自动挖掘人类和小鼠基因集。

Patterns (N Y). 2024 Oct 11;5(10):101072. doi: 10.1016/j.patter.2024.101072.

IL-1β and CXCR4 as Potential Therapeutic Targets for Alzheimer's Disease.白细胞介素-1β和趋化因子受体4作为阿尔茨海默病的潜在治疗靶点

Comb Chem High Throughput Screen. 2024 May 20. doi: 10.2174/0113862073295516240508173238.

RummaGEO: Automatic Mining of Human and Mouse Gene Sets from GEO.RummaGEO：从基因表达综合数据库自动挖掘人类和小鼠基因集

bioRxiv. 2024 Apr 13:2024.04.09.588712. doi: 10.1101/2024.04.09.588712.

Identifying New Contributors to Brain Metastasis in Lung Adenocarcinoma: A Transcriptomic Meta-Analysis.鉴定肺腺癌脑转移的新促成因素：一项转录组学荟萃分析

Cancers (Basel). 2023 Sep 12;15(18):4526. doi: 10.3390/cancers15184526.

Challenges to sharing sample metadata in computational genomics.计算基因组学中样本元数据共享面临的挑战。

Front Genet. 2023 May 23;14:1154198. doi: 10.3389/fgene.2023.1154198. eCollection 2023.

Establishing a prediction model of severe acute mountain sickness using machine learning of support vector machine recursive feature elimination.基于支持向量机递归特征消除的机器学习建立高原重症急性病预测模型。

Sci Rep. 2023 Mar 21;13(1):4633. doi: 10.1038/s41598-023-31797-0.

GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA.GEOfetch：一个命令行工具，用于从 GEO 和 SRA 下载数据和标准化元数据。

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad069.

New Drug Development and Clinical Trial Design by Applying Genomic Information Management.应用基因组信息管理进行新药研发与临床试验设计

Pharmaceutics. 2022 Jul 24;14(8):1539. doi: 10.3390/pharmaceutics14081539.

本文引用的文献

Informatics, Data Science, and Artificial Intelligence.信息学、数据科学与人工智能。

JAMA. 2018 Sep 18;320(11):1103-1104. doi: 10.1001/jama.2018.8211.

ImaGEO: integrative gene expression meta-analysis from GEO database.ImaGEO：从 GEO 数据库进行的综合基因表达荟萃分析。

Bioinformatics. 2019 Mar 1;35(5):880-882. doi: 10.1093/bioinformatics/bty721.

Controllability and stability analysis of large transcriptomic dynamic systems for host response to influenza infection in human.人类宿主对流感感染反应的大型转录组动态系统的可控性和稳定性分析

Infect Dis Model. 2016 Sep 13;1(1):52-70. doi: 10.1016/j.idm.2016.07.002. eCollection 2016 Oct.

Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans.基于相关性的时间序列数据迭代聚类方法：人类流感感染中时间基因反应模块的识别

Infect Dis Model. 2016 Sep 2;1(1):28-39. doi: 10.1016/j.idm.2016.07.001. eCollection 2016 Oct.

GEOMetaCuration: a web-based application for accurate manual curation of Gene Expression Omnibus metadata.GEOMetaCuration：一个基于网络的应用程序，用于准确地手动整理基因表达综合数据集元数据。

Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay019.

ALE: automated label extraction from GEO metadata.ALE：从 GEO 元数据中自动提取标签。

BMC Bioinformatics. 2017 Dec 28;18(Suppl 14):509. doi: 10.1186/s12859-017-1888-1.

ScanGEO: parallel mining of high-throughput gene expression data.ScanGEO：高通量基因表达数据的并行挖掘。

Bioinformatics. 2017 Nov 1;33(21):3500-3501. doi: 10.1093/bioinformatics/btx452.

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).预测CEDAR中的生物医学元数据：基因表达综合数据库（GEO）研究

J Biomed Inform. 2017 Aug;72:132-139. doi: 10.1016/j.jbi.2017.06.017. Epub 2017 Jun 16.

Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide.全球科学家在数据共享、数据重用实践及认知方面的变化。

PLoS One. 2015 Aug 26;10(8):e0134826. doi: 10.1371/journal.pone.0134826. eCollection 2015.

Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.利用本体指纹识别来消除生物医学文献中基因名称实体的歧义。

Database (Oxford). 2015 Apr 8;2015:bav034. doi: 10.1093/database/bav034. Print 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重构 GEO：用于基因组动态分析的基因表达综合（GEO）元数据重构。

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献