Yang Xiaoping, Zhang Zhongxia, Zhang Zhongqiu, Mo Yuting, Li Lianbei, Yu Li, Zhu Peican
School of Information, Renmin University of China, Beijing 100872, China.
School of Computer Science, Northeastern University, Shenyang 110819, China.
Comput Intell Neurosci. 2016;2016:2093406. doi: 10.1155/2016/2093406. Epub 2016 Nov 29.
Manual annotation of sentiment lexicons costs too much labor and time, and it is also difficult to get accurate quantification of emotional intensity. Besides, the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons (Wang et al., 2010). This paper implements statistical training for large-scale Chinese corpus through neural network language model and proposes an automatic method of constructing a multidimensional sentiment lexicon based on constraints of coordinate offset. In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts, we further present a sentiment disambiguation algorithm to increase the flexibility of our lexicon. Lastly, we present a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc. Experiments show the superior performance of SentiRuc lexicon in category labeling test, intensity labeling test, and sentiment classification tasks. It is worth mentioning that, in intensity label test, SentiRuc outperforms the second place by 21 percent.
情感词典的人工标注耗费过多人力和时间,而且难以对情感强度进行准确量化。此外,对某一特定领域的过度强调极大地限制了领域情感词典的适用性(Wang等人,2010年)。本文通过神经网络语言模型对大规模中文语料库进行统计训练,并提出一种基于坐标偏移约束构建多维情感词典的自动方法。为了区分那些在不同语境中可能表达正负两种含义的词语的情感极性,我们进一步提出一种情感消歧算法,以提高词典的灵活性。最后,我们提出一个全局优化框架,该框架提供了一种统一的方式来整合多种人工标注资源,用于学习我们的10维情感词典SentiRuc。实验表明,SentiRuc词典在类别标注测试、强度标注测试和情感分类任务中表现优异。值得一提的是,在强度标注测试中,SentiRuc比排名第二的高出21%。