Suppr超能文献

多变量混合成员模型:推断特定领域的风险概况。

MULTIVARIATE MIXED MEMBERSHIP MODELING: INFERRING DOMAIN-SPECIFIC RISK PROFILES.

作者信息

Russo Massimiliano, Singer Burton H, Dunson David B

机构信息

Harvard Medical School, and Dana-Farber Cancer Institute.

University of Florida.

出版信息

Ann Appl Stat. 2022 Mar;16(1):391-413. doi: 10.1214/21-aoas1496. Epub 2022 Mar 28.

Abstract

Characterizing the shared memberships of individuals in a classification scheme poses severe interpretability issues, even when using a moderate number of classes (say 4). Mixed membership models quantify this phenomenon, but they typically focus on goodness-of-fit more than on interpretable inference. To achieve a good numerical fit, these models may in fact require many extreme profiles, making the results difficult to interpret. We introduce a new class of multivariate mixed membership models that, when variables can be partitioned into subject-matter based domains, can provide a good fit to the data using fewer profiles than standard formulations. The proposed model explicitly accounts for the blocks of variables corresponding to the distinct domains along with a cross-domain correlation structure, which provides new information about shared membership of individuals in a complex classification scheme. We specify a multivariate logistic normal distribution for the membership vectors, which allows easy introduction of auxiliary information leveraging a latent multivariate logistic regression. A Bayesian approach to inference, relying on Pólya gamma data augmentation, facilitates efficient posterior computation via Markov Chain Monte Carlo. We apply this methodology to a spatially explicit study of malaria risk over time on the Brazilian Amazon frontier.

摘要

在一个分类体系中刻画个体的共享成员身份会带来严重的可解释性问题,即使使用的类别数量适中(比如4个)。混合成员模型对这种现象进行了量化,但它们通常更关注拟合优度而非可解释的推断。为了实现良好的数值拟合,这些模型实际上可能需要许多极端概况,这使得结果难以解释。我们引入了一类新的多变量混合成员模型,当变量可以基于主题划分为不同领域时,该模型能够使用比标准公式更少的概况来很好地拟合数据。所提出的模型明确考虑了与不同领域相对应的变量块以及跨领域相关结构,这为复杂分类体系中个体的共享成员身份提供了新信息。我们为成员向量指定了一个多变量逻辑正态分布,这允许通过潜在的多变量逻辑回归轻松引入辅助信息。基于波利亚伽马数据增强的贝叶斯推断方法,通过马尔可夫链蒙特卡罗促进了高效的后验计算。我们将这种方法应用于对巴西亚马逊边境地区疟疾风险随时间变化的空间明确研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47fa/9222983/c5b72a1f2fc0/nihms-1815564-f0002.jpg

相似文献

3
Simplex Factor Models for Multivariate Unordered Categorical Data.多元无序分类数据的单纯形因子模型
J Am Stat Assoc. 2012 Mar 1;107(497):362-377. doi: 10.1080/01621459.2011.646934.
7
Fast Moment Estimation for Generalized Latent Dirichlet Models.广义潜在狄利克雷模型的快速矩估计
J Am Stat Assoc. 2018;113(524):1528-1540. doi: 10.1080/01621459.2017.1341839. Epub 2018 Nov 13.

本文引用的文献

2
Simplex Factor Models for Multivariate Unordered Categorical Data.多元无序分类数据的单纯形因子模型
J Am Stat Assoc. 2012 Mar 1;107(497):362-377. doi: 10.1080/01621459.2011.646934.
6
Malaria risk on the Amazon frontier.亚马逊边境地区的疟疾风险。
Proc Natl Acad Sci U S A. 2006 Feb 14;103(7):2452-7. doi: 10.1073/pnas.0510576103. Epub 2006 Feb 6.
7
Mixed-membership models of scientific publications.科学出版物的混合成员模型。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5220-7. doi: 10.1073/pnas.0307760101. Epub 2004 Mar 12.
8
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验