Prud'hommeaux Emily, Roark Brian
Rochester Institute of Technology, College of Liberal Arts, 92 Lomb Memorial Dr., Rochester, NY 14623.
Google, Inc., 1001 SW Fifth Avenue, Suite 1100, Portland OR 97204.
Comput Linguist Assoc Comput Linguist. 2015 Dec;41(4):549-578. doi: 10.1162/coli_a_00232. Epub 2015 Dec 1.
Among the more recent applications for natural language processing algorithms has been the analysis of spoken language data for diagnostic and remedial purposes, fueled by the demand for simple, objective, and unobtrusive screening tools for neurological disorders such as dementia. The automated analysis of narrative retellings in particular shows potential as a component of such a screening tool since the ability to produce accurate and meaningful narratives is noticeably impaired in individuals with dementia and its frequent precursor, mild cognitive impairment, as well as other neurodegenerative and neurodevelopmental disorders. In this article, we present a method for extracting narrative recall scores automatically and highly accurately from a word-level alignment between a retelling and the source narrative. We propose improvements to existing machine translation-based systems for word alignment, including a novel method of word alignment relying on random walks on a graph that achieves alignment accuracy superior to that of standard expectation maximization-based techniques for word alignment in a fraction of the time required for expectation maximization. In addition, the narrative recall score features extracted from these high-quality word alignments yield diagnostic classification accuracy comparable to that achieved using manually assigned scores and significantly higher than that achieved with summary-level text similarity metrics used in other areas of NLP. These methods can be trivially adapted to spontaneous language samples elicited with non-linguistic stimuli, thereby demonstrating the flexibility and generalizability of these methods.
在自然语言处理算法的最新应用中,有一项是对口语数据进行分析,以用于诊断和补救目的。这一应用受到了对痴呆症等神经系统疾病的简单、客观且不引人注意的筛查工具的需求的推动。特别是对叙述复述的自动分析显示出作为此类筛查工具的一个组成部分的潜力,因为痴呆症患者以及其常见的前驱症状——轻度认知障碍,以及其他神经退行性和神经发育障碍患者产生准确且有意义叙述的能力会明显受损。在本文中,我们提出了一种从复述与源叙述之间的词级对齐中自动且高精度地提取叙述回忆分数的方法。我们对现有的基于机器翻译的词对齐系统提出了改进,包括一种基于图上随机游走的新颖词对齐方法,该方法在期望最大化所需时间的一小部分内就能实现优于基于标准期望最大化的词对齐技术的对齐精度。此外,从这些高质量词对齐中提取的叙述回忆分数特征所产生的诊断分类精度与使用人工分配分数所达到的精度相当,并且显著高于自然语言处理其他领域中使用的摘要级文本相似性度量所达到的精度。这些方法可以很容易地适用于由非语言刺激引发的自发语言样本,从而证明了这些方法的灵活性和通用性。