Rao Rajesh P N, Yadav Nisha, Vahia Mayank N, Joglekar Hrishikesh, Adhikari R, Mahadevan Iravatham
Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA.
Proc Natl Acad Sci U S A. 2009 Aug 18;106(33):13685-90. doi: 10.1073/pnas.0906237106. Epub 2009 Aug 5.
Although no historical information exists about the Indus civilization (flourished ca. 2600-1900 B.C.), archaeologists have uncovered about 3,800 short samples of a script that was used throughout the civilization. The script remains undeciphered, despite a large number of attempts and claimed decipherments over the past 80 years. Here, we propose the use of probabilistic models to analyze the structure of the Indus script. The goal is to reveal, through probabilistic analysis, syntactic patterns that could point the way to eventual decipherment. We illustrate the approach using a simple Markov chain model to capture sequential dependencies between signs in the Indus script. The trained model allows new sample texts to be generated, revealing recurring patterns of signs that could potentially form functional subunits of a possible underlying language. The model also provides a quantitative way of testing whether a particular string belongs to the putative language as captured by the Markov model. Application of this test to Indus seals found in Mesopotamia and other sites in West Asia reveals that the script may have been used to express different content in these regions. Finally, we show how missing, ambiguous, or unreadable signs on damaged objects can be filled in with most likely predictions from the model. Taken together, our results indicate that the Indus script exhibits rich synactic structure and the ability to represent diverse content. both of which are suggestive of a linguistic writing system rather than a nonlinguistic symbol system.
尽管没有关于印度河文明(约公元前2600 - 1900年繁荣)的历史信息,但考古学家已经发现了大约3800个该文明所使用文字的简短样本。尽管在过去80年里有大量的尝试和宣称的破译,但这种文字仍然未被破译。在此,我们提议使用概率模型来分析印度河文字的结构。目标是通过概率分析揭示可能为最终破译指明方向的句法模式。我们使用一个简单的马尔可夫链模型来说明这种方法,以捕捉印度河文字中符号之间的顺序依赖性。经过训练的模型能够生成新的样本文本,揭示可能构成潜在语言功能亚单位的符号重复模式。该模型还提供了一种定量方法来测试特定字符串是否属于马尔可夫模型所捕捉的假定语言。将此测试应用于在美索不达米亚和西亚其他遗址发现的印度河印章,结果表明这种文字在这些地区可能被用于表达不同的内容。最后,我们展示了如何用模型中最有可能的预测来填补受损物体上缺失、模糊或无法辨认的符号。综合来看,我们的结果表明印度河文字展现出丰富的句法结构以及表达多样内容的能力,这两者都表明它是一种语言书写系统而非非语言符号系统。