Suppr超能文献

从多维基因组数据中识别多层基因调控模块。

Identifying multi-layer gene regulatory modules from multi-dimensional genomic data.

机构信息

Program in Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.

出版信息

Bioinformatics. 2012 Oct 1;28(19):2458-66. doi: 10.1093/bioinformatics/bts476. Epub 2012 Aug 3.

Abstract

MOTIVATION

Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse.

RESULTS

In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local 'gene expression factory'. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes.

AVAILABILITY AND IMPLEMENTATION

The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/.

CONTACT

xjzhou@usc.edu

SUPPLEMENTARY INFORMATION

Supplementary material are available at Bioinformatics online.

摘要

动机

真核基因表达(GE)受到精确协调的多层次控制,跨越表观遗传、转录和转录后调控的水平。最近,新兴的多维基因组数据集为研究跨层调控相互作用提供了前所未有的机会。在这些数据集中,同一组样本在几个基因组活动层面上进行了分析,例如拷贝数变异(CNV)、DNA 甲基化(DM)、GE 和 microRNA 表达(ME)。然而,目前适合此类数据的分析方法还很少。

结果

在本文中,我们引入了一种稀疏多块偏最小二乘(sMBPLS)回归方法,用于从这种新型数据中识别多维调控模块。一个多维调控模块包含来自不同层的调控因子集,这些因子可能共同促成局部的“基因表达工厂”。我们在模拟数据以及包括 230 个样本的 CNV、DM、ME 和 GE 数据的癌症基因组图谱卵巢癌数据集上演示了我们方法的性能。我们表明,大多数鉴定的模块具有显著的功能和转录丰度富集,高于仅使用单一类型基因组数据鉴定的模块。我们对模块的网络分析表明,CNV、DM 和 microRNA 可以对重要癌基因和肿瘤抑制基因的表达产生耦合影响。

可用性和实现

用 MATLAB 实现的源代码可在以下网址免费获得:http://zhoulab.usc.edu/sMBPLS/。

联系人

xjzhou@usc.edu

补充信息

补充材料可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dac1/3463121/acd8d08c2484/bts476f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验