Mao Yi, Dillon Joshua, Lebanon Guy
Computer Engineering, Purdue University, West Lafayette, USA.
IEEE Trans Vis Comput Graph. 2007 Nov-Dec;13(6):1208-15. doi: 10.1109/TVCG.2007.70592.
Documents and other categorical valued time series are often characterized by the frequencies of short range sequential patterns such as n-grams. This representation converts sequential data of varying lengths to high dimensional histogram vectors which are easily modeled by standard statistical models. Unfortunately, the histogram representation ignores most of the medium and long range sequential dependencies making it unsuitable for visualizing sequential data. We present a novel framework for sequential visualization of discrete categorical time series based on the idea of local statistical modeling. The framework embeds categorical time series as smooth curves in the multinomial simplex summarizing the progression of sequential trends. We discuss several visualization techniques based on the above framework and demonstrate their usefulness for document visualization.
文档和其他分类值时间序列通常由诸如n-gram之类的短程序列模式的频率来表征。这种表示将不同长度的序列数据转换为高维直方图向量,这些向量很容易由标准统计模型建模。不幸的是,直方图表示忽略了大多数中长程序列依赖性,使其不适用于可视化序列数据。我们基于局部统计建模的思想提出了一种用于离散分类时间序列顺序可视化的新颖框架。该框架将分类时间序列作为平滑曲线嵌入到多项式单纯形中,总结序列趋势的进展。我们讨论了基于上述框架的几种可视化技术,并展示了它们在文档可视化中的有用性。