Suppr超能文献

层次结构在书面文本中诱导长程动态相关性。

Hierarchical structures induce long-range dynamical correlations in written texts.

作者信息

Alvarez-Lacalle E, Dorow B, Eckmann J-P, Moses E

机构信息

Department of Physics of Complex Systems and Albert Einstein Minerva Center for Theoretical Physics, The Weizmann Institute of Science, Rehovot 76100, Israel.

出版信息

Proc Natl Acad Sci U S A. 2006 May 23;103(21):7956-61. doi: 10.1073/pnas.0510673103. Epub 2006 May 12.

Abstract

Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.

摘要

思想和想法是多维度的且常常同时出现,但通过转化为语言,它们能以惊人的良好顺序被表达出来。这种维度的缩减自然发生,但需要记忆,并且需要相关性的存在,例如在书面文本中。然而,单词出现的相关性很快就会衰减,而先前使用随机游走方法对长程相关性的观察对记忆或语义语境几乎没有提供什么见解。相反,我们研究读者在一个约100个单词的“注意力窗口”内接触到的单词组合。我们通过查看在注意力窗口内共同出现的单词来定义这样的单词组合的向量空间,并分析其结构。共现矩阵的奇异值分解确定了一个基,其向量对应于与文本相关的特定主题或“概念”。当读者阅读文本时,“注意力向量”在这个“概念空间”中描绘出一条方向轨迹。我们发现方向的记忆能长时间保留,形成幂律相关性。幂律的出现暗示了一个潜在的层次网络的存在。确实,强加一个类似于由卷、章、段等定义的层次结构,成功地在一个替代随机文本中创建了与原始文本相同的相关性。我们得出结论,文本中的层次结构有助于创建长程相关性,并利用读者的记忆来重现所表达思想的一些多维度性。

相似文献

1
Hierarchical structures induce long-range dynamical correlations in written texts.层次结构在书面文本中诱导长程动态相关性。
Proc Natl Acad Sci U S A. 2006 May 23;103(21):7956-61. doi: 10.1073/pnas.0510673103. Epub 2006 May 12.
3
Modeling statistical properties of written text.书面文本的统计特性建模。
PLoS One. 2009;4(4):e5372. doi: 10.1371/journal.pone.0005372. Epub 2009 Apr 29.

引用本文的文献

3
Language-like efficiency and structure in house finch song.鸣禽歌中类似语言的效率和结构。
Proc Biol Sci. 2024 Apr 10;291(2020):20240250. doi: 10.1098/rspb.2024.0250. Epub 2024 Apr 3.
8
The dynamics of memory retrieval in hierarchical networks.分层网络中记忆检索的动态过程。
J Comput Neurosci. 2016 Jun;40(3):247-68. doi: 10.1007/s10827-016-0595-7. Epub 2016 Feb 27.
10
On the origin of long-range correlations in texts.文本中长程相关性的起源。
Proc Natl Acad Sci U S A. 2012 Jul 17;109(29):11582-7. doi: 10.1073/pnas.1117723109. Epub 2012 Jul 2.

本文引用的文献

1
Entropy of dialogues creates coherent structures in e-mail traffic.对话的熵在电子邮件通信中创造出连贯的结构。
Proc Natl Acad Sci U S A. 2004 Oct 5;101(40):14333-7. doi: 10.1073/pnas.0405728101. Epub 2004 Sep 24.
2
Hierarchical organization in complex networks.复杂网络中的层次组织。
Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Feb;67(2 Pt 2):026112. doi: 10.1103/PhysRevE.67.026112. Epub 2003 Feb 14.
3
Computational and evolutionary aspects of language.语言的计算与进化方面
Nature. 2002 Jun 6;417(6889):611-7. doi: 10.1038/nature00771.
4
Curvature of co-links uncovers hidden thematic layers in the World Wide Web.共同链接的曲率揭示了万维网中隐藏的主题层。
Proc Natl Acad Sci U S A. 2002 Apr 30;99(9):5825-9. doi: 10.1073/pnas.032093399. Epub 2002 Apr 23.
5
Long-range correlations in nucleotide sequences.核苷酸序列中的长程相关性。
Nature. 1992 Mar 12;356(6365):168-70. doi: 10.1038/356168a0.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验