Wang Kuansan, Shen Zhihong, Huang Chiyuan, Wu Chieh-Han, Eide Darrin, Dong Yuxiao, Qian Junjie, Kanakia Anshul, Chen Alvin, Rogahn Richard
Microsoft Research, Redmond, WA, United States.
Front Big Data. 2019 Dec 3;2:45. doi: 10.3389/fdata.2019.00045. eCollection 2019.
Since the relaunch of Microsoft Academic Services (MAS) 4 years ago, scholarly communications have undergone dramatic changes: more ideas are being exchanged online, more authors are sharing their data, and more software tools used to make discoveries and reproduce the results are being distributed openly. The sheer amount of information available is overwhelming for individual humans to keep up and digest. In the meantime, artificial intelligence (AI) technologies have made great strides and the cost of computing has plummeted to the extent that it has become practical to employ intelligent agents to comprehensively collect and analyze scholarly communications. MAS is one such effort and this paper describes its recent progresses since the last disclosure. As there are plenty of independent studies affirming the effectiveness of MAS, this paper focuses on the use of three key AI technologies that underlies its prowess in capturing scholarly communications with adequate quality and broad coverage: (1) natural language understanding in extracting factoids from individual articles at the web scale, (2) knowledge assisted inference and reasoning in assembling the factoids into a knowledge graph, and (3) a reinforcement learning approach to assessing scholarly importance for entities participating in scholarly communications, called the saliency, that serves both as an analytic and a predictive metric in MAS. These elements enhance the capabilities of MAS in supporting the studies of science of science based on the GOTO principle, i.e., good and open data with transparent and objective methodologies. The current direction of development and how to access the regularly updated data and tools from MAS, including the knowledge graph, a REST API and a website, are also described.
自4年前微软学术服务(MAS)重新推出以来,学术交流已经发生了巨大变化:更多的想法在网上交流,更多的作者在分享他们的数据,更多用于发现和重现结果的软件工具也在公开传播。现有的信息量之大,让个人难以跟上并消化。与此同时,人工智能(AI)技术取得了长足进步,计算成本大幅下降,以至于使用智能代理全面收集和分析学术交流信息变得切实可行。MAS就是这样一项努力,本文描述了自上次披露以来它的最新进展。由于有大量独立研究证实了MAS的有效性,本文重点关注支撑其在高质量、广泛覆盖地捕捉学术交流方面卓越能力的三项关键AI技术的应用:(1)在网络规模上从单篇文章中提取事实细节的自然语言理解;(2)将事实细节组装成知识图谱的知识辅助推理;(3)一种用于评估参与学术交流的实体学术重要性的强化学习方法,即显著性,它在MAS中既是一种分析指标,也是一种预测指标。这些要素增强了MAS基于GOTO原则(即拥有良好且开放的数据以及透明客观的方法)支持科学学研究的能力。本文还描述了当前的发展方向以及如何从MAS访问定期更新的数据和工具,包括知识图谱、REST API和一个网站。