School of Information Studies, Syracuse University, Syracuse, New York, United States of America.
PLoS Comput Biol. 2021 Dec 13;17(12):e1009650. doi: 10.1371/journal.pcbi.1009650. eCollection 2021 Dec.
Academic graphs are essential for communicating complex scientific ideas and results. To ensure that these graphs truthfully reflect underlying data and relationships, visualization researchers have proposed several principles to guide the graph creation process. However, the extent of violations of these principles in academic publications is unknown. In this work, we develop a deep learning-based method to accurately measure violations of the proportional ink principle (AUC = 0.917), which states that the size of shaded areas in graphs should be consistent with their corresponding quantities. We apply our method to analyze a large sample of bar charts contained in 300K figures from open access publications. Our results estimate that 5% of bar charts contain proportional ink violations. Further analysis reveals that these graphical integrity issues are significantly more prevalent in some research fields, such as psychology and computer science, and some regions of the globe. Additionally, we find no temporal and seniority trends in violations. Finally, apart from openly releasing our large annotated dataset and method, we discuss how computational research integrity could be part of peer-review and the publication processes.
学术图表对于传达复杂的科学思想和结果至关重要。为了确保这些图表真实反映基础数据和关系,可视化研究人员提出了一些原则来指导图表创建过程。然而,学术出版物中违反这些原则的程度尚不清楚。在这项工作中,我们开发了一种基于深度学习的方法来准确测量违反比例墨水原则(AUC = 0.917)的程度,该原则规定图表中阴影区域的大小应与其对应的数量一致。我们应用我们的方法来分析从开放获取出版物的 30 万个图中包含的大量条形图。我们的结果估计,5%的条形图包含比例墨水违规。进一步的分析表明,这些图形完整性问题在某些研究领域(如心理学和计算机科学)和全球某些地区更为普遍。此外,我们没有发现违规行为存在时间和资历趋势。最后,除了公开发布我们的大型注释数据集和方法外,我们还讨论了计算研究诚信如何成为同行评审和出版过程的一部分。