Mohseni Mahdi, Gast Volker, Redies Christoph
Experimental Aesthetics Group, Institute of Anatomy I, Jena University Hospital, University of Jena, Jena, Germany.
Department of English and American Studies, University of Jena, Jena, Germany.
Front Psychol. 2021 Mar 31;12:599063. doi: 10.3389/fpsyg.2021.599063. eCollection 2021.
This study investigates global properties of three categories of English text: canonical fiction, non-canonical fiction, and non-fictional texts. The central hypothesis of the study is that there are systematic differences with respect to structural design features between canonical and non-canonical fiction, and between fictional and non-fictional texts. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Corpus of Expository and Fictional Prose (JEFP Corpus). Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity, and (iv) the distribution of topic probabilities in segments of texts. These basic observations are grouped into two more general categories, (a) the lower-level properties (i) and (ii), which are observed at the level of the sentence (reflecting linguistic decoding), and (b) the higher-level properties (iii) and (iv), which are observed at the textual level (reflecting comprehension/integration). The observations for each property are transformed into series, which are analyzed in terms of variance and subjected to Multi-Fractal Detrended Fluctuation Analysis (MFDFA), giving rise to three statistics: (i) the degree of fractality ( ), (ii) the degree of multifractality ( ), i.e., the width of the fractal spectrum, and (iii) the degree of asymmetry ( ) of the fractal spectrum. The statistics thus obtained are compared individually across text categories and jointly fed into a classification model (Support Vector Machine). Our results show that there are in fact differences between the three text categories of interest. In general, lower-level text properties are better discriminators than higher-level text properties. Canonical fictional texts differ from non-canonical ones primarily in terms of variability in lower-level text properties. Fractality seems to be a universal feature of text, slightly more pronounced in non-fictional than in fictional texts. On the basis of our results obtained on the basis of corpus data we point out some avenues for future research leading toward a more comprehensive analysis of textual aesthetics, e.g., using experimental methodologies.
经典小说、非经典小说和非虚构文本。该研究的核心假设是,在结构设计特征方面,经典小说与非经典小说之间,以及虚构文本与非虚构文本之间存在系统性差异。为了研究这些差异,我们编制了一个包含这三类感兴趣文本的语料库,即耶拿说明文与虚构散文语料库(JEFP语料库)。我们研究了全局结构的两个方面,即变异性和自相似(分形)模式,它们反映了文本中的长程相关性。我们使用四种基本观察方法:(i)每句话词性标注的频率,(ii)句子长度,(iii)词汇多样性,以及(iv)文本片段中主题概率的分布。这些基本观察方法被归为两个更一般的类别:(a)较低层次的属性(i)和(ii),它们在句子层面被观察到(反映语言解码);(b)较高层次的属性(iii)和(iv),它们在文本层面被观察到(反映理解/整合)。每个属性的观察结果被转换为序列,对其进行方差分析,并进行多重分形去趋势波动分析(MFDFA),得出三个统计量:(i)分形维数( ),(ii)多重分形维数( ),即分形谱的宽度,以及(iii)分形谱的不对称度( )。由此得到的统计量在不同文本类别之间进行单独比较,并共同输入到一个分类模型(支持向量机)中。我们的结果表明,这三类感兴趣的文本之间确实存在差异。一般来说,较低层次的文本属性比高层次的文本属性更具区分性。经典虚构文本与非经典虚构文本的主要区别在于较低层次文本属性的变异性。分形似乎是文本的一个普遍特征,在非虚构文本中比在虚构文本中稍显明显。基于我们从语料库数据中获得的结果,我们指出了一些未来研究的方向,以朝着对文本美学进行更全面的分析,例如使用实验方法。