Suppr超能文献

通过主题对齐对计数数据进行多尺度分析。

Multiscale analysis of count data through topic alignment.

作者信息

Fukuyama Julia, Sankaran Kris, Symul Laura

机构信息

Department of Statistics, Indiana University Bloomington, 919 E 10th Street, Bloomington, IN 47408, USA.

Department of Statistics, University of Wisconsin - Madison, 1300 University Ave, Madison, WI 53706, USA.

出版信息

Biostatistics. 2023 Oct 18;24(4):1045-1065. doi: 10.1093/biostatistics/kxac018.

Abstract

Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop a method, which we call topic alignment, to study the relationships across models with different $K$. In addition, we present three diagnostics based on the alignment. These techniques can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits into more topics when $K$ increases. This strategy gives more insight into the process of generating the data than choosing a single value of $K$ would. We design a visual representation of these cross-model relationships, show the effectiveness of these tools for interpreting the topics on simulated and real data, and release an accompanying R package, alto.

摘要

主题建模是一种用于描述生物计数数据的常用方法。使用主题模型时,用户必须指定主题数量(K)。由于没有确定的方法来选择(K),而且可能不存在真实值,我们开发了一种方法,称为主题对齐,用于研究不同(K)值的模型之间的关系。此外,我们基于这种对齐方式提出了三种诊断方法。这些技术可以显示在不同模型中始终存在的主题数量、某个主题是否只是短暂出现,或者当(K)增加时一个主题是否会分裂成更多主题。与选择单一的(K)值相比,这种策略能让我们更深入地了解数据生成过程。我们设计了这些跨模型关系的可视化表示,展示了这些工具在解释模拟数据和真实数据主题方面的有效性,并发布了一个配套的R包alto。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验