Suppr超能文献

点击流数据生成了科学的高分辨率地图。

Clickstream data yields high-resolution maps of science.

作者信息

Bollen Johan, Van de Sompel Herbert, Hagberg Aric, Bettencourt Luis, Chute Ryan, Rodriguez Marko A, Balakireva Lyudmila

机构信息

Digital Library Research and Prototyping Team, Research Library, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

出版信息

PLoS One. 2009;4(3):e4803. doi: 10.1371/journal.pone.0004803. Epub 2009 Mar 11.

Abstract

BACKGROUND

Intricate maps of science have been created from citation data to visualize the structure of scientific activity. However, most scientific publications are now accessed online. Scholarly web portals record detailed log data at a scale that exceeds the number of all existing citations combined. Such log data is recorded immediately upon publication and keeps track of the sequences of user requests (clickstreams) that are issued by a variety of users across many different domains. Given these advantages of log datasets over citation data, we investigate whether they can produce high-resolution, more current maps of science.

METHODOLOGY

Over the course of 2007 and 2008, we collected nearly 1 billion user interactions recorded by the scholarly web portals of some of the most significant publishers, aggregators and institutional consortia. The resulting reference data set covers a significant part of world-wide use of scholarly web portals in 2006, and provides a balanced coverage of the humanities, social sciences, and natural sciences. A journal clickstream model, i.e. a first-order Markov chain, was extracted from the sequences of user interactions in the logs. The clickstream model was validated by comparing it to the Getty Research Institute's Architecture and Art Thesaurus. The resulting model was visualized as a journal network that outlines the relationships between various scientific domains and clarifies the connection of the social sciences and humanities to the natural sciences.

CONCLUSIONS

Maps of science resulting from large-scale clickstream data provide a detailed, contemporary view of scientific activity and correct the underrepresentation of the social sciences and humanities that is commonly found in citation data.

摘要

背景

已根据引用数据绘制出复杂的科学图谱,以可视化科学活动的结构。然而,现在大多数科学出版物都可在线获取。学术网络门户记录的详细日志数据规模超过了所有现有引用数据的总和。此类日志数据在出版物发布后立即记录,并跟踪众多不同领域的各类用户发出的用户请求序列(点击流)。鉴于日志数据集相对于引用数据的这些优势,我们研究它们是否能生成高分辨率、更新的科学图谱。

方法

在2007年和2008年期间,我们收集了一些最重要的出版商、聚合商和机构联盟的学术网络门户记录的近10亿次用户交互。由此产生的参考数据集涵盖了2006年全球学术网络门户使用的很大一部分,并在人文、社会科学和自然科学方面提供了均衡的覆盖。从日志中的用户交互序列中提取了一个期刊点击流模型,即一阶马尔可夫链。通过将点击流模型与盖蒂研究机构的《建筑与艺术词库》进行比较来验证该模型。将所得模型可视化为一个期刊网络,该网络勾勒出各个科学领域之间的关系,并阐明社会科学和人文科学与自然科学的联系。

结论

由大规模点击流数据生成的科学图谱提供了科学活动的详细、当代视角,并纠正了引用数据中常见的社会科学和人文科学代表性不足的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/655e/2652715/6defa6eeaaf6/pone.0004803.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验