Suppr超能文献

利用经验定义的负调控基因对小脑和髓母细胞瘤基因表达数据集进行批量归一化处理。

Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes.

机构信息

Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden.

Division for Biology and Bioinformatics, School of Bioscience, The Systems Biology Research Centre, University of Skövde, Skövde, Sweden.

出版信息

Bioinformatics. 2019 Sep 15;35(18):3357-3364. doi: 10.1093/bioinformatics/btz066.

Abstract

MOTIVATION

Medulloblastoma (MB) is a brain cancer predominantly arising in children. Roughly 70% of patients are cured today, but survivors often suffer from severe sequelae. MB has been extensively studied by molecular profiling, but often in small and scattered cohorts. To improve cure rates and reduce treatment side effects, accurate integration of such data to increase analytical power will be important, if not essential.

RESULTS

We have integrated 23 transcription datasets, spanning 1350 MB and 291 normal brain samples. To remove batch effects, we combined the Removal of Unwanted Variation (RUV) method with a novel pipeline for determining empirical negative control genes and a panel of metrics to evaluate normalization performance. The documented approach enabled the removal of a majority of batch effects, producing a large-scale, integrative dataset of MB and cerebellar expression data. The proposed strategy will be broadly applicable for accurate integration of data and incorporation of normal reference samples for studies of various diseases. We hope that the integrated dataset will improve current research in the field of MB by allowing more large-scale gene expression analyses.

AVAILABILITY AND IMPLEMENTATION

The RUV-normalized expression data is available through the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and can be accessed via the GSE series number GSE124814.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

髓母细胞瘤(MB)是一种主要发生在儿童身上的脑癌。如今,大约 70%的患者可以被治愈,但幸存者往往患有严重的后遗症。MB 已经通过分子谱分析进行了广泛的研究,但通常是在规模较小且分散的队列中进行的。为了提高治愈率并减少治疗的副作用,如果不是必不可少的话,准确地整合这些数据以提高分析能力将是非常重要的。

结果

我们整合了 23 个转录数据集,涵盖了 1350 个 MB 和 291 个正常脑组织样本。为了去除批次效应,我们结合了去除不必要的变异(RUV)方法与一种新的确定经验性负对照基因的管道和一组评估归一化性能的指标。所记录的方法能够去除大多数批次效应,生成一个大规模的 MB 和小脑表达数据的综合数据集。该策略将广泛适用于准确整合数据并纳入正常参考样本,用于研究各种疾病。我们希望,通过允许进行更多的大规模基因表达分析,整合数据集将改善 MB 领域的当前研究。

可用性和实现

经过 RUV 归一化的表达数据可通过基因表达综合数据库(GEO;https://www.ncbi.nlm.nih.gov/geo/)获得,并可通过 GEO 系列号 GSE124814 访问。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/97f1/6748729/0d0b211b45c3/btz066f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验