Myint Leslie, Hadavand Aboozar, Jager Leah, Leek Jeffrey
Department of Mathematics, Statistics, and Computer Science, Macalester College, 1600 Grand Ave, Saint Paul, MN 55105.
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe St, Baltimore, MD 21212.
J Stat Educ. 2020;28(1):98-108. doi: 10.1080/10691898.2019.1695554. Epub 2019 Dec 23.
We performed an empirical study of the perceived quality of scientific graphics produced by beginning R users in two plotting systems: the base graphics package ("base R") and the ggplot2 add-on package. In our experiment, students taking a data science course on the Coursera platform were randomized to complete identical plotting exercises using either base R or ggplot2. This exercise involved creating two plots: one bivariate scatterplot and one plot of a multivariate relationship that necessitated using color or panels. Students evaluated their peers on visual characteristics key to clear scientific communication, including plot clarity and sufficient labeling. We observed that graphics created with the two systems rated similarly on many characteristics. However, ggplot2 graphics were generally perceived by students to be slightly more clear overall with respect to presentation of a scientific relationship. This increase was more pronounced for the multivariate relationship. Through expert analysis of submissions, we also find that certain concrete plot features (e.g., trend lines, axis labels, legends, panels, and color) tend to be used more commonly in one system than the other. These observations may help educators emphasize the use of certain plot features targeted to correct common student mistakes.
我们对初级R用户在两种绘图系统中生成的科学图形的感知质量进行了实证研究:基础图形包(“基础R”)和ggplot2附加包。在我们的实验中,在Coursera平台上修读数据科学课程的学生被随机分配,使用基础R或ggplot2完成相同的绘图练习。该练习包括创建两个图表:一个双变量散点图和一个需要使用颜色或面板的多变量关系图。学生们根据对清晰科学交流至关重要的视觉特征对他们的同龄人进行评估,包括图表清晰度和足够的标注。我们观察到,用这两种系统创建的图形在许多特征上的评分相似。然而,学生们普遍认为,就科学关系的呈现而言,ggplot2图形总体上略为更清晰。对于多变量关系,这种提升更为明显。通过对提交作品的专家分析,我们还发现某些具体的图表特征(如趋势线、轴标签、图例、面板和颜色)在一种系统中比在另一种系统中更常用。这些观察结果可能有助于教育工作者强调使用某些图表特征,以纠正学生常见的错误。