Barter Rebecca L, Yu Bin
Department of Statistics, University of California, Berkeley.
J Comput Graph Stat. 2018;27(4):910-922. doi: 10.1080/10618600.2018.1473780. Epub 2018 Aug 20.
The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This paper introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.
现代时代的技术进步使得在科学及其他领域能够收集大量数据。从如此海量的数据集中提取有用信息是一项持续存在的挑战,因为传统的数据可视化工具在高维环境中通常扩展性不佳。一种特别适合可视化大型数据集的现有可视化技术是热图。尽管热图在生物信息学等领域极其流行,但在现代数据分析中,它们仍然是一种严重未得到充分利用的可视化工具。本文介绍了superheat,这是一个新的R包,它为可视化复杂数据集提供了一个极其灵活且可定制的平台。Superheat生成具有吸引力且可扩展的热图,用户可以在其上添加作为散点图的响应变量、作为箱线图的模型结果、作为条形图的相关信息等等。本文的目标有两个:(1)展示热图作为一系列数据类型的核心可视化方法的潜力,(2)突出superheat R包在创建美观且可扩展的热图方面的可定制性和易于实现性。将通过三个可重复的案例研究来探索superheat包的功能和基本适用性,每个案例研究都基于公开可用的数据源。