Suppr超能文献

重构 GEO:用于基因组动态分析的基因表达综合(GEO)元数据重构。

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA.

Universidad Antonio Nariño, Bogotá, Colombia.

出版信息

Database (Oxford). 2019 Jan 1;2019:bay145. doi: 10.1093/database/bay145.

Abstract

MOTIVATION

Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse.

RESULTS

We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series' metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study's description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.

摘要

动机

基因表达综合数据库(GEO)和其他公开可用的数据以非结构化英文文本的形式存储其元数据,这使得自动化重用变得非常困难。

结果

我们采用文本挖掘技术来分析 GEO 的元数据,并开发了重构基因表达数据库(ReGEO)。ReGEO 对 GEO 系列进行了重新组织和分类,并通过从每个系列元数据中自动提取的两个新属性来对其进行搜索。这些属性是实验中测试的时间点数量和正在研究的疾病。ReGEO 还可以通过 GEO 中提供的其他属性来搜索系列,例如平台生物、实验类型、相关 PubMed ID 以及研究描述中的一般关键字。我们的方法大大扩展了 GEO 数据的可用性,为大数据研究时代提高大量公开可用数据的实用性提供了一种可信的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验