利用癌症基因表达数据的复杂性：从单个基因到结构途径方法。

Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods.

机构信息

Computational Biology and Machine Learning Laboratory, Queen's University Belfast, Belfast, UK.

出版信息

Biol Direct. 2012 Dec 10;7:44. doi: 10.1186/1745-6150-7-44.

DOI:10.1186/1745-6150-7-44

PMID:23227854

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3769148/

Abstract

High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.

摘要

高维基因表达数据提供了丰富的信息来源，因为它们可以捕捉到反映细胞生物功能的动态状态下的基因表达水平。出于这个原因，这类数据适合揭示细胞内部的系统相关特性，例如，为了阐明乳腺癌或前列腺癌等复杂疾病的分子机制。然而，这不仅强烈依赖于样本量和数据集的相关结构，还依赖于所测试的统计假设。多年来，已经开发了许多不同的方法来分析基因表达数据，以 (I) 识别单个基因的变化，(II) 识别基因集或途径的变化，以及 (III) 识别途径中相关结构的变化。在本文中，我们将在癌症数据的背景下，回顾所有三种类型方法的统计方法，包括亚型，并提供与软件实现和工具的链接，并解决多重假设检验的一般问题。此外，我们还为这类分析方法的选择提供了建议。