Learning Sciences Institute, Arizona State University, Phoenix, AZ, USA.
Behav Res Methods. 2013 Jun;45(2):499-515. doi: 10.3758/s13428-012-0258-1.
The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6% of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38% of the variance in human scores of essay quality with 91% adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44% of the variance with 94% adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.
写作助手是一个提供写作策略训练的智能辅导系统。其人工智能的很大一部分在于自然语言处理算法,以评估作文质量并指导学生的反馈。由于写作通常非常微妙和主观,因此必须考虑到这些算法的广泛的语言、修辞和上下文特征。本研究评估了计算指标预测作文质量的人类评分的潜力。过去的研究表明,与词汇多样性、词汇频率和句法复杂性相关的语言指标是预测作文质量的人类判断的重要指标,但衔接指标不是。本研究通过包含更大的数据集和扩展的索引集来扩展先前的工作,以评估新的词汇、句法、衔接、修辞和易读性索引。评估了三个模型。报告的模型由麦克纳马拉、克罗斯利和麦卡锡(书面交流 27:57-86,2010 年)包括词汇多样性、词汇频率和句法复杂性的三个指数,仅占较大数据集方差的 6%。一个包含之前写作研究中检查的所有指数的回归模型,以 91%的相邻准确性(即,在 1 分以内)预测了人类作文质量分数的 38%的方差。还包含与修辞和衔接相关的新指数的回归模型预测了 44%的方差,具有 94%的相邻准确性。新指数提高了准确性,但更重要的是,为在写作辅导系统的背景下提供更有意义的反馈提供了手段。