IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):449-458. doi: 10.1109/TCBB.2018.2846648. Epub 2018 Jun 12.
Partial correlation (PC) or conditional mutual information (CMI) is widely used in detecting direct dependencies between the observed variables in biological networks by eliminating indirect correlations/associations, but it fails whenever there are some strong correlations in a network. In this paper, we theoretically develop a multiscale association analysis to overcome this flaw. We propose a new measure, partial association (PA), based on the multiscale conditional mutual information. We show that linear PA and nonlinear PA have clear advantages over PC and CMI from both theoretical and computational aspects. Both simulated models and real omics datasets demonstrate that PA is superior to PC and CMI in terms of accuracy, and is a powerful tool to identify the direct associations or reconstruct molecular networks based on the observed data. Survival and functional analyses of the hub genes in the gene networks reconstructed from TCGA data for different cancers also validated the effectiveness of our method.
部分相关(PC)或条件互信息(CMI)广泛用于通过消除间接相关性/关联来检测生物网络中观测变量之间的直接依赖性,但在网络中存在一些强相关性时,它会失败。在本文中,我们从理论上开发了一种多尺度关联分析来克服这一缺陷。我们提出了一种新的度量,即基于多尺度条件互信息的部分关联(PA)。我们表明,从理论和计算两个方面来看,线性 PA 和非线性 PA 都比 PC 和 CMI 具有明显的优势。模拟模型和真实组学数据集都表明,PA 在准确性方面优于 PC 和 CMI,是一种基于观测数据识别直接关联或重建分子网络的强大工具。对不同癌症 TCGA 数据重建的基因网络中枢纽基因的生存和功能分析也验证了我们方法的有效性。