College of Computing and Digital Media, DePaul University, 243 South Wabash Avenue, Chicago, IL 60604, USA.
Behav Res Methods. 2012 Sep;44(3):622-33. doi: 10.3758/s13428-012-0214-0.
The present study explored different approaches for automatically scoring student essays that were written on the basis of multiple texts. Specifically, these approaches were developed to classify whether or not important elements of the texts were present in the essays. The first was a simple pattern-matching approach called "multi-word" that allowed for flexible matching of words and phrases in the sentences. The second technique was latent semantic analysis (LSA), which was used to compare student sentences to original source sentences using its high-dimensional vector-based representation. Finally, the third was a machine-learning technique, support vector machines, which learned a classification scheme from the corpus. The results of the study suggested that the LSA-based system was superior for detecting the presence of explicit content from the texts, but the multi-word pattern-matching approach was better for detecting inferences outside or across texts. These results suggest that the best approach for analyzing essays of this nature should draw upon multiple natural language processing approaches.
本研究探讨了不同的方法,用于自动评分基于多文本的学生论文。具体来说,这些方法是为了分类文本中的重要元素是否存在于论文中而开发的。第一种是简单的模式匹配方法,称为“多词”,允许灵活匹配句子中的单词和短语。第二种技术是潜在语义分析(LSA),它使用高维基于向量的表示来比较学生句子和原始源句子。最后,第三种是机器学习技术,支持向量机,它从语料库中学习分类方案。研究结果表明,基于 LSA 的系统在检测文本中显式内容的存在方面表现更优,但多词模式匹配方法在检测文本外或跨文本的推理方面表现更好。这些结果表明,分析这种性质的论文的最佳方法应该借鉴多种自然语言处理方法。