Denti Francesco, Camerlenghi Federico, Guindani Michele, Mira Antonietta
Department of Statistics, University of California, Irvine, CA.
Department of Economics, Management and Statistics, University of Milano - Bicocca, Milan, Italy.
J Am Stat Assoc. 2023;118(541):405-416. doi: 10.1080/01621459.2021.1933499. Epub 2021 Jul 14.
The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.
将大型数据集用于靶向治疗干预需要新的方法来刻画特定人群亚组中观察到的异质性。特别是,对于嵌套数据集的推断,需要部分可交换数据的模型,其中观测值被假定按不同单元组织,并且需要一些信息共享来了解各单元的独特特征。在本文中,我们提出了一种嵌套公共原子模型(CAM),它特别适用于分析嵌套数据集,在这些数据集中,各单元的分布预计仅在从每个单元采样的一小部分观测值上有所不同。所提出的CAM允许在分布和观测层面进行两层聚类,并且通过使用计算效率高的嵌套切片采样器算法,适用于可扩展的后验推断。我们进一步讨论了如何扩展所提出的建模框架以处理离散测量,并且我们对一项饮食交换研究中的真实微生物组数据集进行后验推断,以研究肠道微生物群组成的改变如何与不同的饮食习惯相关联。我们还通过模拟研究进一步研究了我们的模型在捕捉总体中真实分布结构方面的性能。