Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan.
Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan.
BMC Bioinformatics. 2020 Mar 12;21(1):101. doi: 10.1186/s12859-020-3444-7.
To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them.
Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot.
The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.
为了识别和确定基因集或生物通路中的关键基因,大多数分析依赖于边际效应的计算或统计显著性检验。然而,这些方法可能并不适用,因为关键节点是常见的连接点,因此与非关键节点相比,它们可能更频繁地与其他节点相互作用。基因节点之间的这种依赖性可以根据通路网络的拓扑结构或它们之间的相关性来推断。
在这里,我们开发了一种包含基因节点的边际(局部)效应以及网络内亲和力度量的通路活性评分。该评分汇总了每个样本中基因集/通路的表达水平,分别对局部和网络信息进行加权。然后,该评分用于通过逐一删除节点进行评估,以检查每个节点的影响。为了说明该方法,我们使用了两个癌症研究,一个涉及乳腺癌患者高分级导管原位癌的 RNA-Seq 数据,另一个涉及卵巢癌患者的微阵列表达数据,以评估该方法的性能,并与现有方法进行比较,这些方法有的考虑了相关性和网络信息,有的则没有。该方法在两个癌症研究中确定的关键节点是已知的有影响力的基因;其中一些已被纳入标准治疗方案,还有一些目前正在临床试验中考虑用于靶向治疗。模拟研究的结果表明,当边际效应较温和或较弱时,所提出的方法仍然可以识别因果节点,而仅依赖于边际效应大小的方法则无法识别。
本研究中提出的 NetworkHub 方法可以有效地利用网络信息与来自标志物值的局部效应相结合,并提供一个有用且互补的因果关键节点优先级列表。