School of Economics and Management, Harbin Institute of Technology (Shenzhen), Shenzhen, China.
World Intellectual Property Organization, Geneva, Switzerland.
PLoS One. 2023 Jun 20;18(6):e0284567. doi: 10.1371/journal.pone.0284567. eCollection 2023.
As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.
新颖性是科学的核心价值,因此,开发一种可靠的方法来衡量科学文献的新颖性至关重要。然而,以前的新颖性度量方法存在一些局限性。首先,大多数以前的度量方法都基于重组新颖性概念,试图识别知识元素的新颖组合,但尚未充分努力识别新颖元素本身(元素新颖性)。其次,大多数以前的度量方法都未经验证,并且不清楚要衡量新颖性的哪个方面。第三,由于技术限制,一些以前的度量方法只能在某些科学领域中计算。因此,本研究旨在提供一种经过验证且适用于所有领域的计算元素新颖性的方法。我们利用机器学习开发了一种词嵌入模型,该模型使我们能够从文本数据中提取语义信息。我们的验证分析表明,我们的词嵌入模型确实传达了语义信息。基于训练有素的词嵌入,我们通过测量文档与文档宇宙其余部分的距离来量化文档的元素新颖性。然后,我们进行了问卷调查,从 800 名科学家那里获得了自我报告的新颖性评分。我们发现,就发现和识别新现象、物质、分子等而言,我们的元素新颖性度量与自我报告的新颖性显著相关,而且这种相关性在不同的科学领域都存在。