Kintsch Walter
Institute of Cognitive Science, University of Colorado, USA.
J Biomed Inform. 2002 Feb;35(1):3-7. doi: 10.1016/s1532-0464(02)00004-7.
This paper introduces latent semantic analysis (LSA), a machine learning method for representing the meaning of words, sentences, and texts. LSA induces a high-dimensional semantic space from reading a very large amount of texts. The meaning of words and texts can be represented as vectors in this space and hence can be compared automatically and objectively.
A generative theory of the mental lexicon based on LSA is described. The word vectors LSA constructs are context free, and each word, irrespective of how many meanings or senses it has, is represented by a single vector. However, when a word is used in different contexts, context appropriate word senses emerge.
Several applications of LSA to educational software are described, involving the ability of LSA to quickly compare the content of texts, such as an essay written by a student and a target essay.
An LSA-based software tool is sketched for machine grading of clinical case summaries written by medical students.
本文介绍潜在语义分析(LSA),一种用于表示单词、句子和文本含义的机器学习方法。LSA通过阅读大量文本诱导出一个高维语义空间。单词和文本的含义可以在这个空间中表示为向量,因此可以自动且客观地进行比较。
描述了一种基于LSA的心理词汇生成理论。LSA构建的单词向量是上下文无关的,每个单词,无论它有多少种含义或语义,都由单个向量表示。然而,当一个单词在不同上下文中使用时,会出现适合上下文的词义。
描述了LSA在教育软件中的几种应用,包括LSA快速比较文本内容的能力,例如学生撰写的一篇文章和一篇目标文章。
概述了一种基于LSA的软件工具,用于对医学生撰写的临床病例总结进行机器评分。