Sun Kun, Wang Rong, Xiong Wenxin
Department of Linguistics, University of Tübingen, Tübingen, Germany.
Institute of Computational Linguistics, University of Stuttgart, Stuttgart, Germany.
Corpus Linguist Linguist Theory. 2021 Feb 25;17(3):599-624. doi: 10.1515/cllt-2020-0064. eCollection 2021 Nov.
The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis. Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres are used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations deriving from the two corpora are readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre.
体裁的概念已从词汇和句法两个角度运用定量方法进行了广泛探究。然而,话语结构很少被用于考察体裁。话语结构主要关注话语单元的相互关系,在体裁分析中可发挥关键作用。尽管如此,很少有定量研究从话语结构角度探究体裁差异。在此,我们使用两个英语话语语料库(RST-DT和GUM)从一个全新视角研究话语结构。RST-DT根据体裁分为四个小的子语料库,另一个包含七种体裁的语料库(GUM)用于交叉验证。通过提取RST标注中的信息,将RST(修辞结构理论)树转换为依存关系表示,以类似于计算句法依存距离的过程来计算……此外,源自这两个语料库的依存关系表示数据可轻松转换为网络数据。之后,我们通过结合话语距离和话语网络来考察这两个语料库中的不同体裁。这两种方法在全面揭示各种体裁的独特性方面相互补充。因此,我们提出了一种使用话语距离和话语网络评估体裁差异的有效定量方法。这项定量研究有助于我们更好地理解体裁的本质。