Suppr超能文献

使用 N-grams 作为风格标记和监督机器学习检测文学写作风格的变化。

Detection of changes in literary writing style using N-grams as style markers and supervised machine learning.

机构信息

Tecnológico Nacional de México (TecNM), Campus Tuxtla Gutierrez, Chiapas, Mexico.

Instituto Politécnico Nacional (IPN), Ciudad de México, Mexico.

出版信息

PLoS One. 2022 Jul 20;17(7):e0267590. doi: 10.1371/journal.pone.0267590. eCollection 2022.

Abstract

The analysis of an author's writing style implies the characterization and identification of the style in terms of a set of features commonly called linguistic features. The analysis can be extrinsic, where the style of an author can be compared with other authors, or intrinsic, where the style of an author is identified through different stages of his life. Intrinsic analysis has been used, for example, to detect mental illness and the effects of aging. A key element of the analysis is the style markers used to model the author's writing patterns. The style markers should handle diachronic changes and be thematic independent. One of the most commonly used style marker in extrinsic style analysis is n-gram. In this paper, we present the evaluation of traditional n-grams (words and characters) and dependency tree syntactic n-grams to solve the task of detecting changes in writing style over time. Our corpus consisted of novels by eleven English-speaking authors. The novels of each author were organized chronologically from the oldest to the most recent work according to the date of publication. Subsequently, two stages were defined: initial and final. In each stage three novels were assigned, novels of the initial stage corresponded to the oldest and those at the final stage to the most recent novels. To analyze changes in the writing style, novels were characterized by using four types of n-grams: characters, words, Part-Of-Speech (POS) tags and syntactic relations n-grams. Experiments were performed with a Logistic Regression classifier. Dimension reduction techniques such as Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) algorithms were evaluated. The results obtained with the different n-grams indicated that all authors presented significant changes in writing style over time. In addition, representations using n-grams of syntactic relations have achieved competitive results among different authors.

摘要

对作者写作风格的分析意味着从一组通常称为语言特征的特征来描述和识别风格。分析可以是外在的,即比较作者之间的风格,也可以是内在的,即通过作者生命的不同阶段来识别他的风格。内在分析已被用于检测精神疾病和衰老的影响。分析的一个关键要素是用于对作者写作模式进行建模的风格标记。风格标记应处理历时变化并具有主题独立性。外在风格分析中最常用的风格标记之一是 n-gram。在本文中,我们评估了传统的 n-gram(单词和字符)和依存树句法 n-gram,以解决随时间检测写作风格变化的任务。我们的语料库由 11 位英语作家的小说组成。根据出版日期,按照从最早到最新的顺序将每位作家的小说组织成不同的阶段:初始阶段和最终阶段。在每个阶段,都分配了三本书,初始阶段的小说对应最早的小说,最终阶段的小说对应最新的小说。为了分析写作风格的变化,我们使用了四种类型的 n-gram 来描述小说:字符、单词、词性 (POS) 标签和句法关系 n-gram。实验使用逻辑回归分类器进行。评估了降维技术,如主成分分析 (PCA) 和潜在语义分析 (LSA) 算法。使用不同 n-gram 获得的结果表明,所有作者的写作风格都随着时间的推移发生了显著变化。此外,使用句法关系 n-gram 的表示形式在不同作者之间也取得了有竞争力的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8392/9299308/f06873fd3d36/pone.0267590.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验