Suppr超能文献

可视化数据科学工作流程以支持第三方笔记本理解:一项实证研究。

Visualising data science workflows to support third-party notebook comprehension: an empirical study.

作者信息

Ramasamy Dhivyabharathi, Sarasua Cristina, Bacchelli Alberto, Bernstein Abraham

机构信息

Department of Informatics, University of Zurich, Zurich, Switzerland.

出版信息

Empir Softw Eng. 2023;28(3):58. doi: 10.1007/s10664-023-10289-9. Epub 2023 Mar 23.

Abstract

Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension.

摘要

数据科学是一个探索性和迭代性的过程,常常会产生复杂且无结构的代码。这段代码通常文档记录不佳,因此第三方很难理解。在本文中,我们首先从真实世界的Jupyter笔记本中收集数据科学代码非线性的经验证据,证实需要新的方法来辅助数据科学代码的交互和理解。其次,我们提出一种可视化方法,该方法能阐明数据科学代码中隐含的工作流程信息,并帮助数据科学家在非线性代码中导航。这种可视化还提供诸如原理依据以及基于单元格注释识别数据科学管道步骤等信息。我们对数据科学家进行了一项用户实验,以评估所提出的方法,评估(i)不同工作流程可视化和(ii)单元格注释对代码理解的影响。我们的结果表明,可视化探索有助于用户获得笔记本的概览,显著提高代码理解。此外,我们的定性分析为数据科学代码理解过程中面临的困难提供了更多见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3de/10036289/5bde75959817/10664_2023_10289_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验