Frachtenberg Eitan
Computer Science, Reed College, Portland, OR, United States of America.
PeerJ Comput Sci. 2023 May 16;9:e1389. doi: 10.7717/peerj-cs.1389. eCollection 2023.
Citation analysis is used extensively in the bibliometrics literature to assess the impact of individual works, researchers, institutions, and even entire fields of study. In this article, we analyze citations in one large and influential field within computer science, namely computer systems. Using citation data from a cross-sectional sample of 2,088 papers in 50 systems conferences from 2017, we examine four research areas of investigation: overall distribution of systems citations; their evolution over time; the differences between databases (Google Scholar and Scopus), and; the characteristics of self-citations in the field. On citation distribution, we find that overall, systems papers were well cited, with the most cited subfields and conference areas within systems being security, databases, and computer architecture. Only 1.5% of papers remain uncited after five years, while 12.8% accrued at least 100 citations. For the second area, we find that most papers achieved their first citation within a year from publication, and the median citation count continued to grow at an almost linear rate over five years, with only a few papers peaking before that. We also find that early citations could be linked to papers with a freely available preprint, or may be primarily composed of self-citations. For the third area, it appears that the choice of citation database makes little difference in relative citation comparisons, despite marked differences in absolute counts. On the fourth area, we find that the ratio of self-citations to total citations starts relatively high for most papers but appears to stabilize by 12-18 months, at which point highly cited papers revert to predominately external citations. Past self-citation count (taken from each paper's reference list) appears to bear little if any relationship with the future self-citation count of each paper. The primary practical implication of these results is that the impact of systems papers, as measured in citations, tends to be high relative to comparable studies of other fields and that it takes at least five years to stabilize. A secondary implication is that at least for this field, Google Scholar appears to be a reliable source of citation data for relative comparisons.
引文分析在文献计量学文献中被广泛用于评估个人作品、研究人员、机构乃至整个研究领域的影响力。在本文中,我们分析了计算机科学中一个大型且有影响力的领域——计算机系统中的引文情况。利用来自2017年50个系统会议的2088篇论文的横断面样本的引文数据,我们研究了四个调查研究领域:系统引文的总体分布;其随时间的演变;数据库(谷歌学术和Scopus)之间的差异,以及;该领域自引的特征。关于引文分布,我们发现总体而言,系统论文被引用得很好,系统中被引用最多的子领域和会议领域是安全、数据库和计算机体系结构。五年后只有1.5%的论文未被引用,而12.8%的论文至少获得了100次引用。对于第二个领域,我们发现大多数论文在发表后一年内获得了首次引用,并且中位数引用次数在五年内几乎以线性速度持续增长,只有少数论文在此之前达到峰值。我们还发现早期引用可能与有免费预印本的论文有关,或者可能主要由自引组成。对于第三个领域,尽管绝对计数存在显著差异,但在相对引文比较中,引文数据库的选择似乎影响不大。在第四个领域,我们发现大多数论文的自引与总引用之比开始时相对较高,但在12 - 18个月时似乎趋于稳定,此时被高度引用的论文主要转为外部引用。过去的自引次数(从每篇论文的参考文献列表中获取)似乎与每篇论文未来的自引次数几乎没有关系。这些结果的主要实际意义在于,以引用次数衡量,系统论文的影响力相对于其他领域的可比研究往往较高,并且至少需要五年时间才能稳定下来。第二个意义在于,至少对于这个领域来说,谷歌学术似乎是进行相对比较的可靠引文数据来源。