Guo Zijian
Department of Statistics, Rutgers University.
J Am Stat Assoc. 2024;119(547):1968-1984. doi: 10.1080/01621459.2023.2233162. Epub 2023 Aug 4.
Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this paper, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen, Bühlmann, AoS, 43(4), 1801-1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a non-standard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other non-standard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments. The proposed method is implemented in the package Maximininfer available from CRAN.
对来自多个来源的数据进行综合分析对于得出可推广的发现至关重要。在多个源人群中持续观察到的关联更有可能推广到可能存在分布变化的目标人群。在本文中,我们使用多个高维回归对异质多源数据进行建模,并对极大极小效应进行推断(Meinshausen, Bühlmann, 《统计学年鉴》, 43(4), 1801 - 1830)。极大极小效应提供了一种衡量多源数据中稳定关联的方法。显著的极大极小效应表明一个变量在多个源人群中具有共同的共享效应,并且这些共享效应可能推广到更广泛的目标人群集合。推断极大极小效应存在挑战,因为其点估计量可能具有非标准的极限分布。我们设计了一种新颖的抽样方法来构建极大极小效应的有效置信区间。所提出的置信区间具有参数长度。这种抽样过程和相关的理论分析对于解决其他非标准推断问题具有独立的意义。利用酵母在多种环境下生长的遗传数据,我们证明了具有显著极大极小效应的遗传变异在新环境下具有可推广的效应。所提出的方法在可从CRAN获取的Maximininfer包中实现。