Global Pricing and Market Access, F. Hoffmann-La Roche Ltd, CH-4070 Basel, Switzerland.
Lancaster Medical School, Lancaster University, Lancaster, LA1 4YQ, UK.
Bioinformatics. 2017 Sep 15;33(18):2890-2896. doi: 10.1093/bioinformatics/btx322.
Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset.
We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them.
As the Bioconductor package nethet.
staedler.n@gmail.com or sach.mukherjee@dzne.de.
Supplementary data are available at Bioinformatics online.
分子途径和网络在基础和疾病生物学中起着关键作用。一个新的概念是,编码分子相互作用模式的网络本身可能在不同的环境(如细胞类型、组织或疾病(亚)类型)中有所不同。然而,虽然已经广泛研究了平均表达水平差异的统计检验,但网络差异的检验仍然具有挑战性。此外,由于网络差异可以提供重要的、可生物解释的信息,以识别分子亚组,因此需要考虑学习定义这些亚组和网络的无监督任务。这是一个具有挑战性的聚类问题,在开始时既不知道亚组,也不知道特定于亚组的网络。
我们利用网络生物学环境中测试和聚类的高维统计的最新思想。我们描述的方法可以直接应用于大多数连续的分子测量,并且不需要事先指定网络。我们使用来自癌症基因组图谱(TCGA)的蛋白质数据的案例研究来说明这些想法和方法。这提供了证据表明信号蛋白之间相互作用的模式在癌症类型之间存在显著差异。此外,我们展示了如何使用所提出的方法来学习定义它们的亚型和分子网络。
作为 Bioconductor 包 nethet。
staedler.n@gmail.com 或 sach.mukherjee@dzne.de。
补充数据可在 Bioinformatics 在线获取。