Kawamura Takahiro, Watanabe Katsutaro, Matsumoto Naoya, Egami Shusaku, Jibu Mari
Japan Science and Technology Agency, Tokyo, Japan.
Scientometrics. 2018;116(2):941-958. doi: 10.1007/s11192-018-2783-x. Epub 2018 May 28.
Maps of science representing the structure of science can help us understand science and technology (S&T) development. Studies have thus developed techniques for analyzing research activities' relationships; however, ongoing research projects and recently published papers have difficulty in applying inter-citation and co-citation analysis. Therefore, in order to characterize what is currently being attempted in the scientific landscape, this paper proposes a new content-based method of locating research projects in a multi-dimensional space using the recent word/paragraph embedding techniques. Specifically, for addressing an problem associated with the original paragraph vectors, we introduce paragraph vectors based on the information entropies of concepts in an S&T thesaurus. The experimental results show that the proposed method successfully formed a clustered map from 25,607 project descriptions of the 7th Framework Programme of EU from 2006 to 2016 and 34,192 project descriptions of the National Science Foundation from 2012 to 2016.
呈现科学结构的科学图谱有助于我们理解科学技术(S&T)的发展。因此,已有研究开发出了分析研究活动之间关系的技术;然而,正在进行的研究项目和最近发表的论文难以应用相互引用和共被引分析。所以,为了刻画当前科学领域正在尝试的内容,本文提出了一种基于内容的新方法,利用最近的词/段落嵌入技术在多维空间中定位研究项目。具体而言,为了解决与原始段落向量相关的问题,我们引入了基于科技词库中概念信息熵的段落向量。实验结果表明,该方法成功地从2006年至2016年欧盟第七框架计划的25607个项目描述以及2012年至2016年美国国家科学基金会的34192个项目描述中形成了聚类图谱。