Suppr超能文献

利用词嵌入识别新知识元素。

Identify novel elements of knowledge with word embedding.

机构信息

School of Economics and Management, Harbin Institute of Technology (Shenzhen), Shenzhen, China.

World Intellectual Property Organization, Geneva, Switzerland.

出版信息

PLoS One. 2023 Jun 20;18(6):e0284567. doi: 10.1371/journal.pone.0284567. eCollection 2023.

Abstract

As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.

摘要

新颖性是科学的核心价值,因此,开发一种可靠的方法来衡量科学文献的新颖性至关重要。然而,以前的新颖性度量方法存在一些局限性。首先,大多数以前的度量方法都基于重组新颖性概念,试图识别知识元素的新颖组合,但尚未充分努力识别新颖元素本身(元素新颖性)。其次,大多数以前的度量方法都未经验证,并且不清楚要衡量新颖性的哪个方面。第三,由于技术限制,一些以前的度量方法只能在某些科学领域中计算。因此,本研究旨在提供一种经过验证且适用于所有领域的计算元素新颖性的方法。我们利用机器学习开发了一种词嵌入模型,该模型使我们能够从文本数据中提取语义信息。我们的验证分析表明,我们的词嵌入模型确实传达了语义信息。基于训练有素的词嵌入,我们通过测量文档与文档宇宙其余部分的距离来量化文档的元素新颖性。然后,我们进行了问卷调查,从 800 名科学家那里获得了自我报告的新颖性评分。我们发现,就发现和识别新现象、物质、分子等而言,我们的元素新颖性度量与自我报告的新颖性显著相关,而且这种相关性在不同的科学领域都存在。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/239c/10281565/b3d8c4345499/pone.0284567.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验