Massey Louis
Department of Mathematics and Computer Science, Royal Military College, Kingston, ON, Canada K7K 7B4.
Comput Intell Neurosci. 2014;2014:920892. doi: 10.1155/2014/920892. Epub 2014 Mar 13.
Topics identification (TI) is the process that consists in determining the main themes present in natural language documents. The current TI modeling paradigm aims at acquiring semantic information from statistic properties of large text datasets. We investigate the mental mechanisms responsible for the identification of topics in a single document given existing knowledge. Our main hypothesis is that topics are the result of accumulated neural activation of loosely organized information stored in long-term memory (LTM). We experimentally tested our hypothesis with a computational model that simulates LTM activation. The model assumes activation decay as an unavoidable phenomenon originating from the bioelectric nature of neural systems. Since decay should negatively affect the quality of topics, the model predicts the presence of short-term memory (STM) to keep the focus of attention on a few words, with the expected outcome of restoring quality to a baseline level. Our experiments measured topics quality of over 300 documents with various decay rates and STM capacity. Our results showed that accumulated activation of loosely organized information was an effective mental computational commodity to identify topics. It was furthermore confirmed that rapid decay is detrimental to topics quality but that limited capacity STM restores quality to a baseline level, even exceeding it slightly.
主题识别(TI)是一个确定自然语言文档中主要主题的过程。当前的TI建模范式旨在从大型文本数据集的统计属性中获取语义信息。我们研究在已有知识的情况下,负责识别单个文档中主题的心理机制。我们的主要假设是,主题是长期记忆(LTM)中存储的松散组织信息的累积神经激活的结果。我们用一个模拟LTM激活的计算模型对我们的假设进行了实验测试。该模型假设激活衰减是源自神经系统生物电性质的一种不可避免的现象。由于衰减会对主题质量产生负面影响,该模型预测存在短期记忆(STM)以将注意力集中在几个单词上,预期结果是将质量恢复到基线水平。我们的实验测量了300多篇具有不同衰减率和STM容量的文档的主题质量。我们的结果表明,松散组织信息的累积激活是识别主题的一种有效的心理计算方式。此外还证实,快速衰减对主题质量有害,但有限容量的STM能将质量恢复到基线水平,甚至略有超出。