Gómez-Adorno Helena, Sidorov Grigori, Pinto David, Vilariño Darnes, Gelbukh Alexander
Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico.
Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico.
Sensors (Basel). 2016 Aug 29;16(9):1374. doi: 10.3390/s16091374.
We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.
我们将集成句法图特征提取方法应用于自动作者身份检测任务。这种基于图的表示方式允许将不同层次的语言描述整合到一个单一结构中。我们基于从集成句法图上的最短路径遍历获得的特征来提取文本模式,并将其应用于确定文档的作者。与现有方法不同,平均而言,我们的方法优于当前的先进方法,并且在不同语料库上都能持续给出高准确率的结果。我们的结果表明,我们提取的文本模式对于作者身份归属任务是有用的。