层次结构在书面文本中诱导长程动态相关性。

Hierarchical structures induce long-range dynamical correlations in written texts.

作者信息

Alvarez-Lacalle E, Dorow B, Eckmann J-P, Moses E

机构信息

Department of Physics of Complex Systems and Albert Einstein Minerva Center for Theoretical Physics, The Weizmann Institute of Science, Rehovot 76100, Israel.

出版信息

Proc Natl Acad Sci U S A. 2006 May 23;103(21):7956-61. doi: 10.1073/pnas.0510673103. Epub 2006 May 12.

DOI:10.1073/pnas.0510673103

PMID:16698933

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1472411/

Abstract

Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a "window of attention," spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or "concepts" that are relevant to the text. As the reader follows a text, the "vector of attention" traces out a trajectory of directions in this "concept space." We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader's memory in reenacting some of the multidimensionality of the thoughts being expressed.

摘要

思想和想法是多维度的且常常同时出现，但通过转化为语言，它们能以惊人的良好顺序被表达出来。这种维度的缩减自然发生，但需要记忆，并且需要相关性的存在，例如在书面文本中。然而，单词出现的相关性很快就会衰减，而先前使用随机游走方法对长程相关性的观察对记忆或语义语境几乎没有提供什么见解。相反，我们研究读者在一个约100个单词的“注意力窗口”内接触到的单词组合。我们通过查看在注意力窗口内共同出现的单词来定义这样的单词组合的向量空间，并分析其结构。共现矩阵的奇异值分解确定了一个基，其向量对应于与文本相关的特定主题或“概念”。当读者阅读文本时，“注意力向量”在这个“概念空间”中描绘出一条方向轨迹。我们发现方向的记忆能长时间保留，形成幂律相关性。幂律的出现暗示了一个潜在的层次网络的存在。确实，强加一个类似于由卷、章、段等定义的层次结构，成功地在一个替代随机文本中创建了与原始文本相同的相关性。我们得出结论，文本中的层次结构有助于创建长程相关性，并利用读者的记忆来重现所表达思想的一些多维度性。

相似文献

Hierarchical structures induce long-range dynamical correlations in written texts.层次结构在书面文本中诱导长程动态相关性。

Proc Natl Acad Sci U S A. 2006 May 23;103(21):7956-61. doi: 10.1073/pnas.0510673103. Epub 2006 May 12.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Modeling statistical properties of written text.书面文本的统计特性建模。

PLoS One. 2009;4(4):e5372. doi: 10.1371/journal.pone.0005372. Epub 2009 Apr 29.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words.文学文本中的长时记忆：论罕见词的普遍聚类

PLoS One. 2016 Nov 28;11(11):e0164658. doi: 10.1371/journal.pone.0164658. eCollection 2016.

Heaps' Law and Heaps functions in tagged texts: evidences of their linguistic relevance.希普斯定律与带标签文本中的希普斯函数：其语言关联性的证据

R Soc Open Sci. 2020 Mar 18;7(3):200008. doi: 10.1098/rsos.200008. eCollection 2020 Mar.

Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes.用霍克斯过程对书面文本中单词的长程动态相关性进行建模。

Entropy (Basel). 2022 Jun 22;24(7):858. doi: 10.3390/e24070858.

Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach.基于词嵌入挖掘健康论坛文本开发消费者健康词汇表：半自动方法

JMIR Med Inform. 2019 May 23;7(2):e12704. doi: 10.2196/12704.

Probing the statistical properties of unknown texts: application to the Voynich Manuscript.探测未知文本的统计属性：在伏尼契手稿中的应用。

PLoS One. 2013 Jul 2;8(7):e67310. doi: 10.1371/journal.pone.0067310. Print 2013.

Size of saccade and fixation duration of eye movements during reading: psychophysics of Japanese text processing.阅读过程中眼跳的大小和注视持续时间：日语文本处理的心理物理学

J Opt Soc Am A. 1992 Jan;9(1):5-13. doi: 10.1364/josaa.9.000005.

引用本文的文献

Punctuation Patterns in by James Joyce Are Largely Translation-Invariant.詹姆斯·乔伊斯作品中的标点符号模式在很大程度上是翻译不变的。

Entropy (Basel). 2025 Feb 7;27(2):177. doi: 10.3390/e27020177.

Multifractal Hopscotch in by Julio Cortázar.胡利奥·科塔萨尔所著的《多重分形跳房子》。（你提供的原文似乎不完整，不太明确具体语境，这是按照常规理解的一种可能翻译。）

Entropy (Basel). 2024 Aug 22;26(8):716. doi: 10.3390/e26080716.

Language-like efficiency and structure in house finch song.鸣禽歌中类似语言的效率和结构。

Proc Biol Sci. 2024 Apr 10;291(2020):20240250. doi: 10.1098/rspb.2024.0250. Epub 2024 Apr 3.

Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes.用霍克斯过程对书面文本中单词的长程动态相关性进行建模。

Entropy (Basel). 2022 Jun 22;24(7):858. doi: 10.3390/e24070858.

Long-range sequential dependencies precede complex syntactic production in language acquisition.长程序列依赖先于语言习得中的复杂句法产生。

Proc Biol Sci. 2022 Mar 9;289(1970):20212657. doi: 10.1098/rspb.2021.2657.

Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions.迈向声音交流的计算神经行为学：从生物声学至神经生理学，新兴工具与未来方向。

Front Behav Neurosci. 2021 Dec 20;15:811737. doi: 10.3389/fnbeh.2021.811737. eCollection 2021.

Complexity-entropy analysis at different levels of organisation in written language.书面语言在不同组织层次上的复杂性-熵分析。

PLoS One. 2019 May 8;14(5):e0214863. doi: 10.1371/journal.pone.0214863. eCollection 2019.

The dynamics of memory retrieval in hierarchical networks.分层网络中记忆检索的动态过程。

J Comput Neurosci. 2016 Jun;40(3):247-68. doi: 10.1007/s10827-016-0595-7. Epub 2016 Feb 27.

Languages cool as they expand: allometric scaling and the decreasing need for new words.语言随着扩展而变得更加酷：异速生长和对新词的需求减少。

Sci Rep. 2012;2:943. doi: 10.1038/srep00943. Epub 2012 Dec 10.

On the origin of long-range correlations in texts.文本中长程相关性的起源。

Proc Natl Acad Sci U S A. 2012 Jul 17;109(29):11582-7. doi: 10.1073/pnas.1117723109. Epub 2012 Jul 2.

本文引用的文献

Entropy of dialogues creates coherent structures in e-mail traffic.对话的熵在电子邮件通信中创造出连贯的结构。

Proc Natl Acad Sci U S A. 2004 Oct 5;101(40):14333-7. doi: 10.1073/pnas.0405728101. Epub 2004 Sep 24.

Hierarchical organization in complex networks.复杂网络中的层次组织。

Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Feb;67(2 Pt 2):026112. doi: 10.1103/PhysRevE.67.026112. Epub 2003 Feb 14.

Computational and evolutionary aspects of language.语言的计算与进化方面

Nature. 2002 Jun 6;417(6889):611-7. doi: 10.1038/nature00771.

Curvature of co-links uncovers hidden thematic layers in the World Wide Web.共同链接的曲率揭示了万维网中隐藏的主题层。

Proc Natl Acad Sci U S A. 2002 Apr 30;99(9):5825-9. doi: 10.1073/pnas.032093399. Epub 2002 Apr 23.

Long-range correlations in nucleotide sequences.核苷酸序列中的长程相关性。

Nature. 1992 Mar 12;356(6365):168-70. doi: 10.1038/356168a0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验