Grechkin Maxim, Logsdon Benjamin A, Gentles Andrew J, Lee Su-In
Department of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America.
Sage Bionetworks, Seattle, Washington, United States of America.
PLoS Comput Biol. 2016 May 4;12(5):e1004888. doi: 10.1371/journal.pcbi.1004888. eCollection 2016 May.
We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.
我们提出了一个名为DISCERN(差异稀疏调控网络)的计算框架,用于识别在不同生物学状态下基于mRNA表达数据集推断出的基因-调控因子依赖网络中的信息拓扑变化。DISCERN将两个表达数据集作为输入:一个是来自患有感兴趣疾病患者的患病组织的表达数据集,另一个是来自匹配正常组织的表达数据集。DISCERN估计每个基因在疾病和正常条件之间推断的基因-调控因子依赖性中受到扰动的程度——具有不同的调控因子连接性。这种方法相对于现有方法具有明显优势。首先,DISCERN推断候选调控因子与基因之间的条件依赖性,其中条件依赖关系比成对相关性更精确地区分直接相互作用与间接相互作用的证据。其次,DISCERN使用一种基于新的似然性评分函数来减轻对特定网络中推断的特定边准确性的担忧。在合成数据中,DISCERN比现有方法更准确地识别不同状态之间受扰动的基因。在急性髓系白血病(AML)、乳腺癌和肺癌患者的表达数据集中,每种癌症中具有高DISCERN评分的基因在各自癌症中富含已知的肿瘤驱动因子、与已知在疾病中重要的生物学过程相关的基因以及与患者预后相关的基因。最后,我们表明,基于来自ENCODE项目的可用表观基因组数据,DISCERN能够比其他方法更准确地解释癌症和正常组织类型中观察到的表观基因组活性模式,从而揭示网络扰动背后的潜在机制。