QUEST Center for Responsible Research, Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany.
Clin Sci (Lond). 2022 Aug 12;136(15):1139-1156. doi: 10.1042/CS20220287.
Recent work has raised awareness about the need to replace bar graphs of continuous data with informative graphs showing the data distribution. The impact of these efforts is not known. The present observational meta-research study examined how often scientists in different fields use various graph types, and assessed whether visualization practices have changed between 2010 and 2020. We developed and validated an automated screening tool, designed to identify bar graphs of counts or proportions, bar graphs of continuous data, bar graphs with dot plots, dot plots, box plots, violin plots, histograms, pie charts, and flow charts. Papers from 23 fields (approximately 1000 papers/field per year) were randomly selected from PubMed Central and screened (n=227998). F1 scores for different graphs ranged between 0.83 and 0.95 in the internal validation set. While the tool also performed well in external validation sets, F1 scores were lower for uncommon graphs. Bar graphs are more often used incorrectly to display continuous data than they are used correctly to display counts or proportions. The proportion of papers that use bar graphs of continuous data varies markedly across fields (range in 2020: 4-58%), with high rates in biochemistry and cell biology, complementary and alternative medicine, physiology, genetics, oncology and carcinogenesis, pharmacology, microbiology and immunology. Visualization practices have improved in some fields in recent years. Fewer than 25% of papers use flow charts, which provide information about attrition and the risk of bias. The present study highlights the need for continued interventions to improve visualization and identifies fields that would benefit most.
最近的研究工作引起了人们对用显示数据分布的信息图替代连续数据条形图的重视。但这些努力的影响尚不清楚。本观察性元研究旨在调查不同领域的科学家使用各种图形类型的频率,并评估 2010 年至 2020 年间可视化实践是否发生了变化。我们开发并验证了一种自动化筛选工具,旨在识别计数或比例的条形图、连续数据的条形图、带有点图的条形图、点图、箱线图、小提琴图、直方图、饼图和流程图。从 PubMed Central 中随机选择了来自 23 个领域(每年每个领域约 1000 篇论文)的论文进行筛选(n=227998)。不同图形的 F1 分数在内部验证集中介于 0.83 至 0.95 之间。虽然该工具在外部验证集中也表现良好,但对于不常见的图形,F1 分数较低。条形图用于显示连续数据的错误用法比用于显示计数或比例的正确用法更为常见。在不同领域,使用连续数据条形图的论文比例差异很大(2020 年范围:4-58%),生物化学和细胞生物学、补充和替代医学、生理学、遗传学、肿瘤学和致癌作用、药理学、微生物学和免疫学领域的比例较高。近年来,一些领域的可视化实践有所改善。使用流程图的论文不足 25%,流程图可提供关于损耗和偏倚风险的信息。本研究强调了需要继续进行干预以改善可视化效果,并确定最受益的领域。