Suppr超能文献

250年英语科技写作中的语言变异与变化:一种数据驱动的方法

Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach.

作者信息

Bizzoni Yuri, Degaetano-Ortlieb Stefania, Fankhauser Peter, Teich Elke

机构信息

Language Science and Technology, Saarland University, Saarbrücken, Germany.

Digital Linguistics, Institut für Deutsche Sprache, Mannheim, Germany.

出版信息

Front Artif Intell. 2020 Sep 16;3:73. doi: 10.3389/frai.2020.00073. eCollection 2020.

Abstract

We trace the evolution of Scientific English through the Late Modern period to modern time on the basis of a comprehensive corpus composed of the Transactions and Proceedings of the Royal Society of London, the first and longest-running English scientific journal established in 1665. Specifically, we explore the linguistic imprints of specialization and diversification in the science domain which accumulate in the formation of "scientific language" and field-specific sublanguages/registers (chemistry, biology etc.). We pursue an exploratory, data-driven approach using state-of-the-art computational language models and combine them with selected information-theoretic measures (entropy, relative entropy) for comparing models along relevant dimensions of variation (time, register). Focusing on selected linguistic variables (lexis, grammar), we show how we deploy computational language models for capturing linguistic variation and change and discuss benefits and limitations.

摘要

我们以《伦敦皇家学会会报》(Transactions and Proceedings of the Royal Society of London)所构成的综合语料库为基础,追溯科学英语从近代晚期到现代的演变历程。该学报是1665年创办的第一份也是存续时间最长的英文科学期刊。具体而言,我们探究科学领域专业化和多样化的语言印记,这些印记在“科学语言”以及特定领域的子语言/语域(如化学、生物学等)的形成过程中不断积累。我们采用探索性的、数据驱动的方法,运用最先进的计算语言模型,并将其与选定的信息论度量(熵、相对熵)相结合,以便沿着相关的变化维度(时间、语域)比较模型。聚焦于选定的语言变量(词汇、语法),我们展示如何运用计算语言模型来捕捉语言变异和变化,并讨论其利弊。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d1b/7861277/f096a53d0200/frai-03-00073-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验