多变量混合成员模型：推断特定领域的风险概况。

MULTIVARIATE MIXED MEMBERSHIP MODELING: INFERRING DOMAIN-SPECIFIC RISK PROFILES.

作者信息

Russo Massimiliano, Singer Burton H, Dunson David B

机构信息

Harvard Medical School, and Dana-Farber Cancer Institute.

University of Florida.

出版信息

Ann Appl Stat. 2022 Mar;16(1):391-413. doi: 10.1214/21-aoas1496. Epub 2022 Mar 28.

DOI:10.1214/21-aoas1496

PMID:35757598

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9222983/

Abstract

Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may in fact require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure, which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors, which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov Chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.

摘要

在一个分类体系中刻画个体的共享成员身份会带来严重的可解释性问题，即使使用的类别数量适中（比如4个）。混合成员模型对这种现象进行了量化，但它们通常更关注拟合优度而非可解释的推断。为了实现良好的数值拟合，这些模型实际上可能需要许多极端概况，这使得结果难以解释。我们引入了一类新的多变量混合成员模型，当变量可以基于主题划分为不同领域时，该模型能够使用比标准公式更少的概况来很好地拟合数据。所提出的模型明确考虑了与不同领域相对应的变量块以及跨领域相关结构，这为复杂分类体系中个体的共享成员身份提供了新信息。我们为成员向量指定了一个多变量逻辑正态分布，这允许通过潜在的多变量逻辑回归轻松引入辅助信息。基于波利亚伽马数据增强的贝叶斯推断方法，通过马尔可夫链蒙特卡罗促进了高效的后验计算。我们将这种方法应用于对巴西亚马逊边境地区疟疾风险随时间变化的空间明确研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47fa/9222983/c5b72a1f2fc0/nihms-1815564-f0002.jpg

相似文献

MULTIVARIATE MIXED MEMBERSHIP MODELING: INFERRING DOMAIN-SPECIFIC RISK PROFILES.多变量混合成员模型：推断特定领域的风险概况。

Ann Appl Stat. 2022 Mar;16(1):391-413. doi: 10.1214/21-aoas1496. Epub 2022 Mar 28.

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation.基于逻辑树正态潜在狄利克雷分配的微生物亚群落学习。

Biometrics. 2023 Sep;79(3):2321-2332. doi: 10.1111/biom.13772. Epub 2022 Oct 28.

Simplex Factor Models for Multivariate Unordered Categorical Data.多元无序分类数据的单纯形因子模型

J Am Stat Assoc. 2012 Mar 1;107(497):362-377. doi: 10.1080/01621459.2011.646934.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A Bayesian nonparametric model for classification of longitudinal profiles.一种用于纵向轮廓分类的贝叶斯非参数模型。

Biostatistics. 2022 Dec 12;24(1):209-225. doi: 10.1093/biostatistics/kxab026.

Evaluating sensitivity to classification uncertainty in latent subgroup effect analyses.评估潜在亚组效应分析中对分类不确定性的敏感性。

BMC Med Res Methodol. 2022 Sep 24;22(1):247. doi: 10.1186/s12874-022-01720-8.

Fast Moment Estimation for Generalized Latent Dirichlet Models.广义潜在狄利克雷模型的快速矩估计

J Am Stat Assoc. 2018;113(524):1528-1540. doi: 10.1080/01621459.2017.1341839. Epub 2018 Nov 13.

FSTruct: An F -based tool for measuring ancestry variation in inference of population structure.FSTruct：一种基于 F 的工具，用于在推断种群结构中的祖先变异测量。

Mol Ecol Resour. 2022 Oct;22(7):2614-2626. doi: 10.1111/1755-0998.13647. Epub 2022 Jul 20.

A Dirichlet model of alignment cost in mixed-membership unsupervised clustering.混合成员无监督聚类中对齐成本的狄利克雷模型。

J Comput Graph Stat. 2023;32(3):1145-1159. doi: 10.1080/10618600.2022.2127739. Epub 2022 Nov 14.

A Bayesian latent class model for predicting gestational age in health administrative data.基于健康管理数据的贝叶斯潜在类别模型预测胎龄。

Pharm Stat. 2022 Nov;21(6):1199-1218. doi: 10.1002/pst.2225. Epub 2022 May 10.

引用本文的文献

A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models.基于数据变换和矩阵变量高斯混合模型的三方 RNA 测序数据聚类方法。

BMC Bioinformatics. 2024 Mar 1;25(1):90. doi: 10.1186/s12859-024-05717-6.

本文引用的文献

Centered Partition Processes: Informative Priors for Clustering (with Discussion).中心划分过程：聚类的信息先验（附讨论）

Bayesian Anal. 2021 Mar;16(1):301-370. doi: 10.1214/20-BA1197. Epub 2020 Feb 13.

Simplex Factor Models for Multivariate Unordered Categorical Data.多元无序分类数据的单纯形因子模型

J Am Stat Assoc. 2012 Mar 1;107(497):362-377. doi: 10.1080/01621459.2011.646934.

Mixed Membership Stochastic Blockmodels.混合成员随机块模型

J Mach Learn Res. 2008 Sep;9:1981-2014.

DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA.通过多变量二元数据的个体水平混合模型描述残疾情况。

Ann Appl Stat. 2007;1(2):346-384. doi: 10.1214/07-aoas126.

Spatial patterns of malaria in the Amazon: implications for surveillance and targeted interventions.亚马逊地区疟疾的空间分布模式：对监测和针对性干预措施的启示

Health Place. 2007 Jun;13(2):368-80. doi: 10.1016/j.healthplace.2006.03.006. Epub 2006 Jul 11.

Malaria risk on the Amazon frontier.亚马逊边境地区的疟疾风险。

Proc Natl Acad Sci U S A. 2006 Feb 14;103(7):2452-7. doi: 10.1073/pnas.0510576103. Epub 2006 Feb 6.

Mixed-membership models of scientific publications.科学出版物的混合成员模型。

Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5220-7. doi: 10.1073/pnas.0307760101. Epub 2004 Mar 12.

Finding scientific topics.寻找科学主题。

Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.

The magical number seven plus or minus two: some limits on our capacity for processing information.神奇的数字七，加二或减二：我们信息处理能力的某些局限。

Psychol Rev. 1956 Mar;63(2):81-97.

Black/white differences in health status and mortality among the elderly.老年人健康状况和死亡率方面的黑白差异。

Demography. 1989 Nov;26(4):661-78.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验