IEEE Trans Vis Comput Graph. 2018 Jan;24(1):382-391. doi: 10.1109/TVCG.2017.2745080. Epub 2017 Aug 29.
Topic modeling algorithms are widely used to analyze the thematic composition of text corpora but remain difficult to interpret and adjust. Addressing these limitations, we present a modular visual analytics framework, tackling the understandability and adaptability of topic models through a user-driven reinforcement learning process which does not require a deep understanding of the underlying topic modeling algorithms. Given a document corpus, our approach initializes two algorithm configurations based on a parameter space analysis that enhances document separability. We abstract the model complexity in an interactive visual workspace for exploring the automatic matching results of two models, investigating topic summaries, analyzing parameter distributions, and reviewing documents. The main contribution of our work is an iterative decision-making technique in which users provide a document-based relevance feedback that allows the framework to converge to a user-endorsed topic distribution. We also report feedback from a two-stage study which shows that our technique results in topic model quality improvements on two independent measures.
主题建模算法被广泛应用于分析文本语料库的主题构成,但仍难以解释和调整。针对这些局限性,我们提出了一个模块化的可视分析框架,通过用户驱动的强化学习过程来解决主题模型的可理解性和可适应性,而无需深入了解底层的主题建模算法。给定一个文档语料库,我们的方法基于参数空间分析初始化两个算法配置,以增强文档的可分离性。我们在一个交互式可视化工作空间中抽象模型的复杂性,用于探索两个模型的自动匹配结果、调查主题摘要、分析参数分布以及查看文档。我们工作的主要贡献是一种迭代决策技术,用户可以提供基于文档的相关性反馈,从而使框架能够收敛到用户认可的主题分布。我们还报告了一项两阶段研究的反馈,表明我们的技术在两个独立的指标上都能提高主题模型的质量。