Faigenbaum-Golovin Shira, Kipnis Alon, Bühler Axel, Piasetzky Eli, Römer Thomas, Finkelstein Israel
Department of Mathematics and Rhodes Information Initiative, Duke University, North Carolina, United States of America.
School of Computer Science, Reichman University, Herzliya, Israel.
PLoS One. 2025 Jun 3;20(6):e0322905. doi: 10.1371/journal.pone.0322905. eCollection 2025.
The Bible is the product of a complex process of oral and written transmissions that stretched across centuries and traditions. This implies ongoing revision of the "original" or oldest textual layers over the course of hundreds of years. Although critical scholarship recognizes this fact, debates abound regarding the reconstruction of the different layers, their date of composition and their historical backgrounds. Traditional methodologies have grappled with these challenges through textual and diachronic criticism, employing linguistic, stylistic, inner-biblical, archaeological and historical criteria. In this study, we use computer-assisted methods to address the question of authorship of biblical texts by employing statistical analysis that is particularly sensitive to deviations in word frequencies. Here, the term "word" may be generalized to "n-gram" (a sequence of words) or other countable text features. This paper consists of two parts. In the first part, we focus on differentiating between three distinct scribal corpora across numerous chapters in the Enneateuch, the first nine books of the Bible. Specifically, we examine 50 chapters labeled according to biblical exegesis considerations into three corpora: the old layer in Deuteronomy (D), texts belonging to the "Deuteronomistic History" in Joshua-to-Kings (DtrH), and the Priestly writings (P). For pragmatic reasons, we chose entire chapters, in which the number of verses potentially attributed to different authors or redactors is negligible. Without prior assumptions about author identity, our approach leverages subtle differences in word frequencies to distinguish among the three corpora and identify author-dependent linguistic properties. Our analysis indicated that the first two scribal corpora - (D, the oldest layers of Deuteronomy, and DtrH, the so-called Deuteronomistic History) - are much more closely related to each other than they are to the third, (P). This observation aligns with scholarly consensus. In addition, we attained high accuracy in attributing authorship by evaluating the similarity of each chapter to the reference corpora. In the second part of the paper, we report on our use of the three corpora as ground truth to examine other biblical texts whose authorship is disputed by biblical experts. Here, we demonstrate the potential contribution of insights achieved in the first part. Our paper sheds new light on the question of authorship of biblical texts by offering interpretable, statistically significant evidence of the existence of linguistic characteristics in the writing of biblical authors/redactors, that can be identified automatically. Our methodology thus provides a new tool to address disputed matters in biblical studies.
《圣经》是一个跨越数百年和多种传统的复杂口头与书面传承过程的产物。这意味着在数百年间,“原始”或最古老的文本层次不断被修订。尽管批判性学术研究承认这一事实,但关于不同层次的重构、其成书年代及其历史背景的争论却层出不穷。传统方法通过文本批评和历时批评来应对这些挑战,运用语言、文体、圣经内部、考古和历史等标准。在本研究中,我们使用计算机辅助方法,通过采用对词频偏差特别敏感的统计分析来解决圣经文本的作者归属问题。在此,“词”这一术语可广义地理解为“n元语法”(词的序列)或其他可数的文本特征。本文由两部分组成。在第一部分中,我们着重区分《圣经》前九卷《六经》中众多章节的三个不同的抄写文集。具体而言,我们考察了根据圣经注释考量分为三个文集的50个章节:《申命记》中的旧层次(D)、《约书亚记》至《列王纪》中属于“申命记历史”的文本(DtrH)以及祭司文献(P)。出于实际原因我们选择了完整的章节,其中可能归属于不同作者或编辑者的经文数量可忽略不计。在不预先假定作者身份的情况下,我们的方法利用词频的细微差异来区分这三个文集,并识别依赖于作者的语言特性。我们的分析表明,前两个抄写文集——(D,《申命记》最古老的层次,以及DtrH,所谓的申命记历史)——彼此之间的关系比它们与第三个文集(P)的关系更为密切。这一观察结果与学术共识相符。此外我们通过评估每一章与参考文集的相似度,在作者归属判定方面达到了很高的准确率。在论文的第二部分,我们报告了我们将这三个文集作为基本事实依据来考察其他圣经文本的情况,这些文本的作者身份受到圣经专家的争议。在此,我们展示了第一部分所取得的见解的潜在贡献。我们的论文通过提供关于圣经作者/编辑者写作中语言特征存在的可解释、具有统计显著性的证据,为圣经文本的作者归属问题提供了新的视角,这些特征能够被自动识别。因此我们的方法为解决圣经研究中有争议的问题提供了一种新工具。