Wang Wen, Wu Shihao, Zhu Ziwei, Zhou Ling, Song Peter X-K
Department of Biostatistics, University of Michigan, Ann Arbor.
Department of Statistics, University of Michigan, Ann Arbor.
Ann Stat. 2024 Feb;52(1):285-310. doi: 10.1214/23-aos2347. Epub 2024 Mar 7.
Fusing regression coefficients into homogeneous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called -Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called that underpins the difficulty of recovering the true groups. We show that -Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply -Fusion with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide an MIO formulation for -Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that -Fusion exhibits superiority over its competitors in terms of grouping accuracy.
将回归系数融合到同质子组中可以揭示每个组内具有共同值的那些系数。这种组内同质性降低了参数空间的内在维度,并释放出更高的统计精度。我们提出并研究了一种名为-Fusion的新组合分组方法,该方法适用于混合整数优化(MIO)。在统计方面,我们确定了一个名为的基本量,它是恢复真实组难度的基础。我们表明,-Fusion在分组敏感性的最弱可能要求下实现分组一致性:如果违反此要求,则组错误指定的极小极大风险将无法收敛到零。此外,我们表明,在高维情况下,可以将-Fusion应用于具有确定筛选特征集的情况,而不会有任何统计效率的实质性损失,同时大幅降低计算成本。在算法方面,我们为-Fusion提供了一个MIO公式以及一个热启动策略。模拟和实际数据分析表明,-Fusion在分组准确性方面优于其竞争对手。