Suppr超能文献

词汇的层级统计:在文学文本和符号序列中寻找关键词

Level statistics of words: finding keywords in literary texts and symbolic sequences.

作者信息

Carpena P, Bernaola-Galván P, Hackenberg M, Coronado A V, Oliver J L

机构信息

Departamento de Física Aplicada II, Universidad de Málaga, 29071 Málaga, Spain.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Mar;79(3 Pt 2):035102. doi: 10.1103/PhysRevE.79.035102. Epub 2009 Mar 10.

Abstract

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

摘要

通过对量子无序系统的能级统计分析进行推广,我们提出了一种能够自动提取文学文本关键词的方法。我们的方法不仅考虑了文本中出现的单词的频率,还考虑了它们在文本中的空间分布,并且基于这样一个事实:相关单词显著聚类(即它们相互吸引),而非相关单词在文本中随机分布。由于不需要参考语料库,我们的方法特别适用于没有先验信息的单个文档。此外,我们表明我们的方法也适用于一般的符号序列(没有空格的连续文本),从而表明了它的普遍适用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验