IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1991-1998. doi: 10.1109/TCBB.2018.2858755. Epub 2018 Jul 23.
In this article, we present a computational framework to identify "causal relationships" among super gene sets. For "causal relationships," we refer to both stimulatory and inhibitory regulatory relationships, regardless of through direct or indirect mechanisms. For super gene sets, we refer to "pathways, annotated lists, and gene signatures," or PAGs. To identify causal relationships among PAGs, we extend the previous work on identifying PAG-to-PAG regulatory relationships by further requiring them to be significantly enriched with gene-to-gene co-expression pairs across the two PAGs involved. This is achieved by developing a quantitative metric based on PAG-to-PAG Co-expressions (PPC), which we use to infer the likelihood that PAG-to-PAG relationships under examination are causal-either stimulatory or inhibitory. Since true causal relationships are unknown, we approximate the overall performance of inferring causal relationships with the performance of recalling known r-type PAG-to-PAG relationships from causal PAG-to-PAG inference, using a functional genomics benchmark dataset from the GEO database. We report the area-under-curve (AUC) performance for both precision and recall being 0.81. By applying our framework to a myeloid-derived suppressor cells (MDSC) dataset, we further demonstrate that this framework is effective in helping build multi-scale biomolecular systems models with new insights on regulatory and causal links for downstream biological interpretations.
在本文中,我们提出了一种计算框架,用于识别超级基因集之间的“因果关系”。对于“因果关系”,我们指的是刺激和抑制调节关系,无论通过直接还是间接机制。对于超级基因集,我们指的是“途径、注释列表和基因特征”或 PAGs。为了识别 PAG 之间的因果关系,我们通过进一步要求它们在所涉及的两个 PAG 之间显著富集基因对基因共表达对,扩展了先前识别 PAG 到 PAG 调节关系的工作。这是通过开发基于 PAG 到 PAG 共表达 (PPC) 的定量指标来实现的,我们使用该指标来推断正在检查的 PAG 到 PAG 关系是否是因果关系-无论是刺激还是抑制。由于真实的因果关系是未知的,我们使用来自 GEO 数据库的功能基因组基准数据集,通过从因果 PAG 到 PAG 推断中召回已知 r 型 PAG 到 PAG 关系的性能,来近似推断因果关系的总体性能。我们报告了精度和召回率的曲线下面积 (AUC) 性能分别为 0.81。通过将我们的框架应用于髓系来源的抑制细胞 (MDSC) 数据集,我们进一步证明了该框架有助于构建具有调节和因果关系的新见解的多尺度生物分子系统模型,以便进行下游生物学解释。