Frachtenberg Eitan
Computer Science, Reed College, Portland, OR, United States of America.
PeerJ Comput Sci. 2022 Feb 7;8:e887. doi: 10.7717/peerj-cs.887. eCollection 2022.
Research in computer systems often involves the engineering, implementation, and measurement of complex systems software and data. The availability of these artifacts is critical to the reproducibility and replicability of the research results, because system software often embodies numerous implicit assumptions and parameters that are not fully documented in the research article itself. Artifact availability has also been previously associated with higher paper impact, as measured by citations counts. And yet, the sharing of research artifacts is still not as common as warranted by its importance. The primary goal of this study is to provide an exploratory statistical analysis of the artifact-sharing rates and associated factors in the research field of computer systems. To this end, we explore a cross-sectional dataset of papers from 56 contemporaneous systems conferences. In addition to extensive data on the conferences, papers, and authors, this analyze dataset includes data on the release, ongoing availability, badging, and locations of research artifacts. We combine this manually curated dataset with citation counts to evaluate the relationships between different artifact properties and citation metrics. Additionally, we revisit previous observations from other fields on the relationships between artifact properties and various other characteristics of papers, authors, and venue and apply them to this field. The overall rate of artifact sharing we find in this dataset is approximately 30%, although it varies significantly with paper, author, and conference factors, and it is closer to 43% for conferences that actively evaluated artifact sharing. Approximately 20% of all shared artifacts are no longer accessible four years after publications, predominately when hosted on personal and academic websites. Our main finding is that papers with shared artifacts averaged approximately 75% more citations than papers with none. Even after controlling for numerous confounding covariates, the release of an artifact appears to increase the citations of a systems paper by some 34%. This metric is further boosted by the open availability of the paper's text.
计算机系统研究通常涉及复杂系统软件和数据的工程设计、实现及测量。这些工件的可用性对于研究结果的可重复性和可复制性至关重要,因为系统软件往往包含众多隐含假设和参数,而这些在研究文章本身中并未得到充分记录。工件可用性此前也与更高的论文影响力相关,这是通过引用次数来衡量的。然而,研究工件的共享仍不像其重要性所要求的那样普遍。本研究的主要目标是对计算机系统研究领域中工件共享率及相关因素进行探索性统计分析。为此,我们探索了来自56个同期系统会议的论文横断面数据集。除了关于会议、论文和作者的大量数据外,这个分析数据集还包括关于研究工件的发布、持续可用性、标识及存放位置的数据。我们将这个人工整理的数据集与引用次数相结合,以评估不同工件属性与引用指标之间的关系。此外,我们重新审视了其他领域先前关于工件属性与论文、作者和会议场所的各种其他特征之间关系的观察结果,并将其应用于该领域。我们在这个数据集中发现的工件共享总体率约为30%,尽管它因论文、作者和会议因素而有显著差异,对于积极评估工件共享的会议,这一比例更接近43%。在所有共享工件中,约20%在发表四年后就无法再访问,主要是当它们托管在个人和学术网站上时。我们的主要发现是,有共享工件的论文平均引用次数比没有共享工件的论文多约75%。即使在控制了众多混杂协变量之后,工件的发布似乎也会使系统论文的引用次数增加约34%。论文文本的开放可用性进一步提高了这一指标。