Šubelj Lovro, Fiala Dalibor, Bajec Marko
University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, SI-1000 Ljubljana, Slovenia.
University of West Bohemia, Faculty of Applied Sciences, Univerzitní 8, CZ-30614 Plzeň, Czech Republic.
Sci Rep. 2014 Sep 29;4:6496. doi: 10.1038/srep06496.
Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies.
现代文献数据库为科学研究及其评估提供了基础。虽然它们的内容和结构有很大差异,但对于其可靠性仅有一些非正式的概念。在此,我们比较了从六个流行的文献数据库(包括科学网、CiteSeer和arXiv.org)中提取的引文网络的拓扑一致性。通过丰富的局部和全局图统计量对这些网络进行评估。我们首先揭示了一些数据库在个别统计量方面存在统计学上的显著不一致。例如,由于数据库的覆盖范围,DBLP计算机科学文献库引入的领域蝴蝶结分解与其他数据库有很大不同,而arXiv.org内的引文信息最为详尽。最后,我们使用临界差异图在多个图统计量上比较这些数据库。DBLP计算机科学文献库的引文拓扑与其他数据库的一致性最差,而不出所料的是,从一致性角度来看,科学网的可靠性要高得多。这项工作既可以作为文献计量学和科学计量学领域学者的参考,也可以作为政府和研究机构的科学评估指南。