Mariani Joseph, Francopoulo Gil, Paroubek Patrick, Vernier Frédéric
Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France.
Tagmatica, Paris, France.
Front Res Metr Anal. 2022 Jul 27;7:863126. doi: 10.3389/frma.2022.863126. eCollection 2022.
This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016-2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965-2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965-1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%). These analyses are conducted globally or comparatively among sources and also with the general scientific literature, with a focus on the past 5 years. It concludes in identifying profound changes in research topics as well as in the emergence of a new generation of authors and the appearance of new publications around artificial intelligence, neural networks, machine learning, and word embedding.
本文旨在分析过去5年(2016 - 2020年)语音与自然语言处理领域的变化。这是我们在2019年发表的关于NLP4NLP语料库分析的系列两篇论文的延续,该语料库包含了50年(1965 - 2015年)间发表在语音与自然语言处理领域34个主要会议和期刊上的文章,并采用了自然语言处理领域开发的方法进行分析,因此得名。扩展后的NLP4NLP + 5语料库现在涵盖55年,包含近90,000份文档[与NLP4NLP相比增加了30%:2020年单一年份发表的文章数量比前25年(1965 - 1989年)发表的文章总数还多]、67,000位作者(增加了40%)、590,000条参考文献(增加了80%)以及约3.8亿个单词(增加了40%)。这些分析在全球范围内进行,或在不同来源之间进行比较,同时也与一般科学文献进行比较,重点关注过去5年。研究得出结论,研究主题发生了深刻变化,新一代作者涌现,围绕人工智能、神经网络、机器学习和词嵌入的新出版物也不断出现。