Amancio Diego Raphael
Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, São Paulo, Brazil.
PLoS One. 2015 Aug 27;10(8):e0136076. doi: 10.1371/journal.pone.0136076. eCollection 2015.
Statistical methods have been widely employed to study the fundamental properties of language. In recent years, methods from complex and dynamical systems proved useful to create several language models. Despite the large amount of studies devoted to represent texts with physical models, only a limited number of studies have shown how the properties of the underlying physical systems can be employed to improve the performance of natural language processing tasks. In this paper, I address this problem by devising complex networks methods that are able to improve the performance of current statistical methods. Using a fuzzy classification strategy, I show that the topological properties extracted from texts complement the traditional textual description. In several cases, the performance obtained with hybrid approaches outperformed the results obtained when only traditional or networked methods were used. Because the proposed model is generic, the framework devised here could be straightforwardly used to study similar textual applications where the topology plays a pivotal role in the description of the interacting agents.
统计方法已被广泛用于研究语言的基本属性。近年来,复杂和动力系统的方法被证明有助于创建多种语言模型。尽管有大量研究致力于用物理模型表示文本,但只有少数研究表明如何利用底层物理系统的属性来提高自然语言处理任务的性能。在本文中,我通过设计能够提高当前统计方法性能的复杂网络方法来解决这个问题。使用模糊分类策略,我表明从文本中提取的拓扑属性补充了传统的文本描述。在几种情况下,混合方法获得的性能优于仅使用传统方法或网络方法时获得的结果。由于所提出的模型具有通用性,这里设计的框架可以直接用于研究类似的文本应用,其中拓扑在交互主体的描述中起着关键作用。