Suppr超能文献

癌症基因共表达网络中的顺式调控元件分析

Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer.

作者信息

Triska Martin, Ivliev Alexander, Nikolsky Yuri, Tatarinova Tatiana V

机构信息

Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.

Thomson Reuters, Boston, MA, USA.

出版信息

Methods Mol Biol. 2017;1613:291-310. doi: 10.1007/978-1-4939-7027-8_11.

Abstract

Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

摘要

基因共表达网络分析是一种强大的“数据驱动”工具,对于理解癌症生物学和肿瘤发生机制非常宝贵。然而,尽管已经完成了数千项关于癌症基因表达的研究,但很少有人尝试在简洁的“荟萃分析”框架中对来自分散来源的共表达数据进行标准化和整合。在这里,我们描述了一种癌症表达荟萃分析的综合方法,该方法将“数据驱动”共表达网络的生成与共表达簇内启动子序列基序的详细统计检测相结合。首先,我们应用加权基因共表达网络分析(WGCNA)工作流程和皮尔逊相关性,在来自9种不同起源癌症的82个标准化微阵列数据集中生成了3000多个共表达簇的综合集合。接下来,我们设计了一种全基因组统计方法,用于基于相似表达基因启动子之间的相似性检测特定的DNA序列基序。该方法作为cisExpress软件模块实现,专门设计用于分析非常大的数据集,例如由公开可用的全基因组和转录组项目生成的数据集。cisExpress使用任务分配算法来利用共享内存节点内的所有可用计算核心。我们发现尽管共表达模块由不同的基因集组成,但基于启动子序列分析,它们共享不同的稳定共调控模式。每个共表达簇的基序数量根据癌症组织起源差异很大,结肠中数量最多(68个基序),卵巢中数量最少(18个基序)。得分最高的基序通常在几个组织之间共享;它们定义了负责癌症发生某些功能的靶基因集。共表达模块和预先计算的基序数据库均可公开获取,供进一步研究使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验