Bizzoni Yuri, Degaetano-Ortlieb Stefania, Fankhauser Peter, Teich Elke
Language Science and Technology, Saarland University, Saarbrücken, Germany.
Digital Linguistics, Institut für Deutsche Sprache, Mannheim, Germany.
Front Artif Intell. 2020 Sep 16;3:73. doi: 10.3389/frai.2020.00073. eCollection 2020.
We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of "scientific language" and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.
我们以《伦敦皇家学会会报》(Transactions and Proceedings of the Royal Society of London)所构成的综合语料库为基础,追溯科学英语从近代晚期到现代的演变历程。该学报是1665年创办的第一份也是存续时间最长的英文科学期刊。具体而言,我们探究科学领域专业化和多样化的语言印记,这些印记在“科学语言”以及特定领域的子语言/语域(如化学、生物学等)的形成过程中不断积累。我们采用探索性的、数据驱动的方法,运用最先进的计算语言模型,并将其与选定的信息论度量(熵、相对熵)相结合,以便沿着相关的变化维度(时间、语域)比较模型。聚焦于选定的语言变量(词汇、语法),我们展示如何运用计算语言模型来捕捉语言变异和变化,并讨论其利弊。