Suppr超能文献

通过在基因表达综合数据库中进行自由文本元数据挖掘发现干扰基因靶标。

Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus.

机构信息

Victor Chang Cardiac Research Institute, Sydney, Australia; University of New South Wales, Sydney, Australia.

Victor Chang Cardiac Research Institute, Sydney, Australia.

出版信息

Comput Biol Chem. 2019 Jun;80:152-158. doi: 10.1016/j.compbiolchem.2019.03.014. Epub 2019 Mar 24.

Abstract

There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention. In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network. GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.

摘要

NCBI 的基因表达综合数据库(GEO)中包含超过 250 万个公开的基因表达样本,分布在 101000 个数据系列中。由于 GEO 的自由文本元数据中缺乏使用标准化的本体论术语来注释实验类型和样本类型,因此如果没有大量的手动干预,这个数据库在计算上仍然难以利用。在这项工作中,我们提出了一个名为 GEOracle 的交互式 R/Shiny 工具,它利用文本挖掘和机器学习技术自动识别扰动实验、分组处理和对照样本,并进行差异表达分析。我们应用 GEOracle 来发现保守的信号通路靶基因,并识别特定器官的基因调控网络。GEOracle 通过利用其自由文本元数据,有效地在 GEO 中发现扰动基因靶标。它的有效性和适用性已经通过交叉验证和两个实际案例研究得到了证明。它为解锁 GEO 等大型生物数据库中嵌入的基因调控信息开辟了新的途径。GEOracle 可在 https://github.com/VCCRI/GEOracle 上获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验