Suppr超能文献

与人类语言文本相比,蛋白质序列的语言复杂性。

Linguistic complexity of protein sequences as compared to texts of human languages.

作者信息

Popov O, Segal D M, Trifonov E N

机构信息

Faculty of Humanities, Hebrew University of Jerusalem, Israel.

出版信息

Biosystems. 1996;38(1):65-74. doi: 10.1016/0303-2647(95)01568-x.

Abstract

A notion and a measure of linguistic complexity introduced earlier (Trifonov, 1990) were originally used for analysis of nucleotide sequences. This measure was shown to reflect multiplicity of codes (messages) of different natures superimposed in the sequences. Unlike human language texts, genetic texts are 'read' by cellular mechanisms in several different ways, each time using a different selection of the characters of the same text while skipping others (Trifonov, 1989). Human texts are read in one way only, sequentially and involving all characters (one code). The conceptual significance and essence of the idea on the multiplicity of overlapping codes in genetic sequences, as opposed to human languages, is discussed. The linguistic complexity technique allows a calculation to be made of the structural complexity of any linear sequence of characters irrespective of whether the text is cognized or presently undeciphered. The texts (sequences) are compared exclusively from the point of view of their structural complexity with no reference to the meaning of the texts which is beyond the scope of this article. Results of such a comparison of protein sequences with various texts, written in English, Italian and Welsh are presented. The human texts are found to be structurally simpler than genetic (protein) texts, reflecting, apparently, a difference in the reading modes: single code versus many codes.

摘要

先前提出的一种语言复杂性概念和度量方法(特里方诺夫,1990年)最初用于分析核苷酸序列。该度量方法被证明能反映序列中叠加的不同性质的编码(信息)的多样性。与人类语言文本不同,遗传文本由细胞机制以几种不同方式“读取”,每次使用相同文本中不同的字符选择,同时跳过其他字符(特里方诺夫,1989年)。人类文本仅以一种方式读取,即顺序读取且涉及所有字符(一种编码)。本文讨论了与人类语言相对的遗传序列中重叠编码多样性这一概念的意义和本质。语言复杂性技术能够计算任何线性字符序列的结构复杂性,无论该文本是已知的还是目前尚未破译的。文本(序列)仅从其结构复杂性的角度进行比较,而不涉及文本的含义,这超出了本文的范围。本文展示了蛋白质序列与用英语、意大利语和威尔士语书写的各种文本进行这种比较的结果。结果发现,人类文本在结构上比遗传(蛋白质)文本更简单,这显然反映了阅读模式的差异:单一编码与多种编码。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验