• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

词汇的层级统计:在文学文本和符号序列中寻找关键词

Level statistics of words: finding keywords in literary texts and symbolic sequences.

作者信息

Carpena P, Bernaola-Galván P, Hackenberg M, Coronado A V, Oliver J L

机构信息

Departamento de Física Aplicada II, Universidad de Málaga, 29071 Málaga, Spain.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Mar;79(3 Pt 2):035102. doi: 10.1103/PhysRevE.79.035102. Epub 2009 Mar 10.

DOI:10.1103/PhysRevE.79.035102
PMID:19392005
Abstract

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

摘要

通过对量子无序系统的能级统计分析进行推广,我们提出了一种能够自动提取文学文本关键词的方法。我们的方法不仅考虑了文本中出现的单词的频率,还考虑了它们在文本中的空间分布,并且基于这样一个事实:相关单词显著聚类(即它们相互吸引),而非相关单词在文本中随机分布。由于不需要参考语料库,我们的方法特别适用于没有先验信息的单个文档。此外,我们表明我们的方法也适用于一般的符号序列(没有空格的连续文本),从而表明了它的普遍适用性。

相似文献

1
Level statistics of words: finding keywords in literary texts and symbolic sequences.词汇的层级统计:在文学文本和符号序列中寻找关键词
Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Mar;79(3 Pt 2):035102. doi: 10.1103/PhysRevE.79.035102. Epub 2009 Mar 10.
2
Clustering of DNA words and biological function: a proof of principle.DNA 单词聚类与生物功能:原理验证。
J Theor Biol. 2012 Mar 21;297:127-36. doi: 10.1016/j.jtbi.2011.12.024. Epub 2011 Dec 30.
3
Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels.区分功能 DNA 词;一种测量聚类水平的方法。
Sci Rep. 2017 Jan 27;7:41543. doi: 10.1038/srep41543.
4
On the unsupervised analysis of domain-specific Chinese texts.关于特定领域中文文本的无监督分析。
Proc Natl Acad Sci U S A. 2016 May 31;113(22):6154-9. doi: 10.1073/pnas.1516510113. Epub 2016 May 16.
5
Japanese sound-symbolic words in global contexts: from translation to hybridization.全球化语境下的日语拟声词:从翻译到杂交。
F1000Res. 2021 Oct 8;10:1024. doi: 10.12688/f1000research.55546.2. eCollection 2021.
6
[A new approach to the study of statistical properties of genetic sequences].[一种研究基因序列统计特性的新方法]
Biofizika. 1993 Sep-Oct;38(5):762-7.
7
Heaps' Law and Heaps functions in tagged texts: evidences of their linguistic relevance.希普斯定律与带标签文本中的希普斯函数:其语言关联性的证据
R Soc Open Sci. 2020 Mar 18;7(3):200008. doi: 10.1098/rsos.200008. eCollection 2020 Mar.
8
[Systematic analysis of the readability of patient information on the websites of clinics for plastic surgery].[整形外科诊所网站上患者信息可读性的系统分析]
Handchir Mikrochir Plast Chir. 2014 Dec;46(6):369-74. doi: 10.1055/s-0034-1385936. Epub 2014 Nov 20.
9
Jung, the trickster writer, or what literary research can do for the clinician.荣格,这位善于耍弄技巧的作家,或者文学研究能为临床医生做些什么。
J Anal Psychol. 2006 Apr;51(2):285-99. doi: 10.1111/j.0021-8774.2006.00588.x.
10
Predictive keywords: Using machine learning to explain document characteristics.预测性关键词:利用机器学习来解释文档特征。
Front Artif Intell. 2023 Jan 5;5:975729. doi: 10.3389/frai.2022.975729. eCollection 2022.

引用本文的文献

1
Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels.区分功能 DNA 词;一种测量聚类水平的方法。
Sci Rep. 2017 Jan 27;7:41543. doi: 10.1038/srep41543.
2
Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems.语言系统中文本动态构建过程及词汇组织缩放规律模型
PLoS One. 2016 Dec 22;11(12):e0168971. doi: 10.1371/journal.pone.0168971. eCollection 2016.
3
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.基于序列特征提取DNA单词:非均匀分布和完整性。
Theor Biol Med Model. 2016 Jan 25;13:2. doi: 10.1186/s12976-016-0028-3.
4
A Complex Network Approach to Stylometry.一种用于文体学的复杂网络方法。
PLoS One. 2015 Aug 27;10(8):e0136076. doi: 10.1371/journal.pone.0136076. eCollection 2015.
5
The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.文本中单词的分形模式:一种自动提取关键词的方法。
PLoS One. 2015 Jun 19;10(6):e0130617. doi: 10.1371/journal.pone.0130617. eCollection 2015.
6
An improved alignment-free model for DNA sequence similarity metric.一种用于DNA序列相似性度量的改进的无比对模型。
BMC Bioinformatics. 2014 Sep 28;15(1):321. doi: 10.1186/1471-2105-15-321.
7
Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis.《伏尼契手稿中的关键词与共现模式:信息论分析》
PLoS One. 2013 Jun 21;8(6):e66344. doi: 10.1371/journal.pone.0066344. Print 2013.
8
Segmentation of time series with long-range fractal correlations.具有长程分形相关性的时间序列分割
Eur Phys J B. 2012 Jun 1;85(6). doi: 10.1140/epjb/e2012-20969-5.
9
WordCluster: detecting clusters of DNA words and genomic elements.词簇:检测DNA词和基因组元件的簇
Algorithms Mol Biol. 2011 Jan 24;6:2. doi: 10.1186/1748-7188-6-2.
10
Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.齐夫定律导致海普斯定律:分析有限系统中的它们之间的关系。
PLoS One. 2010 Dec 2;5(12):e14139. doi: 10.1371/journal.pone.0014139.