Kernot David, Bossomaier Terry, Bradbury Roger
National Security College, Australian National University, Canberra, ACT, Australia.
National Security and ISR Division, Defence Science and Technology Group, Edinburgh, SA, Australia.
Front Psychol. 2018 Mar 15;9:289. doi: 10.3389/fpsyg.2018.00289. eCollection 2018.
Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some disputed works: Bacon, de Vere, Stanley, and Marlowe. A myriad of computational and statistical tools and techniques have been used to determine the true authorship of his works. Many of these techniques rely on basic statistical correlations, word counts, collocated word groups, or keyword density, but no one method has been decided on. We suggest that an alternative technique that uses word semantics to draw on personality can provide an accurate profile of a person. To test this claim, we analyse the works of Shakespeare, Christopher Marlowe, and Elizabeth Cary. We use Word Accumulation Curves, Hierarchical Clustering overlays, Principal Component Analysis, and Linear Discriminant Analysis techniques in combination with RPAS, a multi-faceted text analysis approach that draws on a writer's personality, or self to identify subtle characteristics within a person's writing style. Here we find that RPAS can separate the known authored works of Shakespeare from Marlowe and Cary. Further, it separates their contested works, works suspected of being written by others. While few authorship identification techniques identify self from the way a person writes, we demonstrate that these stylistic characteristics are as applicable 400 years ago as they are today and have the potential to be used within cyberspace for law enforcement purposes.
人们对威廉·莎士比亚的私人生活知之甚少,但他却以其戏剧和诗歌作品集而闻名,尽管许多署名他的作品都是匿名出版的。确定莎士比亚的身份已经困扰学者们长达400年之久,在英国文学史上有四位重要人物被认为可能是某些有争议作品的真正作者,而非莎士比亚:培根、德维里、斯坦利和马洛。人们运用了无数的计算和统计工具及技术来确定他作品的真正作者。其中许多技术依赖于基本的统计相关性、单词计数、搭配词组或关键词密度,但尚未确定一种统一的方法。我们认为,一种利用词语语义来推断个性的替代技术能够提供一个人的准确形象。为了验证这一说法,我们分析了莎士比亚、克里斯托弗·马洛和伊丽莎白·卡里的作品。我们将单词累积曲线、层次聚类叠加、主成分分析和线性判别分析技术与RPAS相结合,RPAS是一种多方面的文本分析方法,它借鉴作家的个性或自我来识别一个人写作风格中的细微特征。在这里我们发现,RPAS能够将莎士比亚已知的作品与马洛和卡里的作品区分开来。此外,它还能区分他们有争议的作品,即那些被怀疑是他人所写的作品。虽然很少有作者身份识别技术能从一个人的写作方式中识别出自我,但我们证明,这些文体特征在400年前和现在一样适用,并且有可能在网络空间中用于执法目的。