Ma Xin, Kundu Suprateek, Stevens Jennifer
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, USA.
Department of Biostatistics, The University of Texas at MD Anderson Cancer Center, Houston, USA.
Mach Learn. 2022 Oct;111(10):3733-3767. doi: 10.1007/s10994-022-06174-z. Epub 2022 Jun 2.
Although there has been an explosive rise in network data in a variety of disciplines, there is very limited development of regression modeling approaches based on high-dimensional networks. The scarce literature in this area typically assume linear relationships between the outcome and the high-dimensional network edges that results in an inflated model plagued by the curse of dimensionality and these models are unable to accommodate non-linear relationships or higher order interactions. In order to overcome these limitations, we develop a novel two-stage Bayesian non-parametric regression modeling framework using high-dimensional networks as covariates, which first finds a lower dimensional node-specific representation for the networks, and then embeds these representations in a flexible Gaussian process regression framework along with supplemental covariates for modeling the continuous outcome variable. Moving from edge-level analysis to node-level model allows us to scale up to high-dimensional networks, and enables node selection via an extension of the Gaussian process framework that involves spike-and-slab priors on the lengthscale parameters. Extensive simulations show a distinct advantage of the proposed approach in terms of prediction, coverage, and node selection. The proposed model achieves considerable gains when predicting posttraumatic stress disorder (PTSD) resilience based on brain networks in our motivating neuroimaging applications, and also identifies important brain regions associated with PTSD. In contrast, existing non-linear approaches that employ the full-edge set or those that use other dimension reduction techniques on the network are not equipped for node selection and results in poor prediction and characterization of predictive uncertainty, while linear approaches using the edge-level features are overly inflated and typically result in poor performance.
尽管在各个学科中网络数据呈爆炸式增长,但基于高维网络的回归建模方法的发展却非常有限。该领域稀缺的文献通常假设结果与高维网络边之间存在线性关系,这导致模型因维度诅咒而膨胀,并且这些模型无法适应非线性关系或高阶相互作用。为了克服这些限制,我们开发了一种新颖的两阶段贝叶斯非参数回归建模框架,将高维网络用作协变量,该框架首先为网络找到低维节点特定表示,然后将这些表示与补充协变量一起嵌入灵活的高斯过程回归框架中,以对连续结果变量进行建模。从边级别分析转向节点级别模型使我们能够扩展到高维网络,并通过扩展高斯过程框架(涉及长度尺度参数上的尖峰和平板先验)实现节点选择。大量模拟表明,所提出的方法在预测、覆盖率和节点选择方面具有明显优势。在我们具有启发性的神经成像应用中,基于脑网络预测创伤后应激障碍(PTSD)恢复力时,所提出的模型取得了显著进展,并且还识别出与PTSD相关的重要脑区。相比之下,现有的使用全边集的非线性方法或在网络上使用其他降维技术的方法无法进行节点选择,导致预测不佳且预测不确定性的表征较差,而使用边级别特征的线性方法则过度膨胀,通常性能不佳。