Mohseni Mahdi, Redies Christoph, Gast Volker
Department of English and American Studies, University of Jena, 07743 Jena, Germany.
Experimental Aesthetics Group, Institute of Anatomy I, Jena University Hospital, University of Jena, 07743 Jena, Germany.
Entropy (Basel). 2022 Feb 15;24(2):278. doi: 10.3390/e24020278.
Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction ('classics') and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of local frequencies. We conclude that canonical fictional texts exhibit a higher degree of (sequential) unpredictability compared with non-canonical texts, corresponding to the popular assumption that they are more 'demanding' and 'richer'. In using Approximate Entropy, we propose a new method for text classification in the context of computational textual aesthetics.
计算文本美学旨在研究文本美学类别之间可观察到的差异。我们使用近似熵来衡量两种美学文本类别中的(不可)预测性,即经典小说(“经典作品”)和非经典小说(声望较低)。近似熵是针对从句子长度值和文本窗口中词性标签的分布得出的序列确定的。为了进行比较,我们还纳入了一个非虚构文本样本。此外,我们使用香农熵来估计由于整个文本中的频率分布导致的(不可)预测程度。我们的结果表明,与香农熵相比,近似熵值能更好地区分经典文本和非经典文本,而对于虚构散文与说明性散文的分类则并非如此。因此,经典文本和非经典文本在序列结构上存在差异,而体裁间的差异则在于局部频率的总体分布。我们得出结论,与非经典文本相比,经典虚构文本表现出更高程度的(序列)不可预测性,这与它们更“有要求”和“更丰富”的普遍假设相符。在使用近似熵时,我们提出了一种在计算文本美学背景下进行文本分类的新方法。