Suppr超能文献

多情感词典的自动构建与全局优化

Automatic Construction and Global Optimization of a Multisentiment Lexicon.

作者信息

Yang Xiaoping, Zhang Zhongxia, Zhang Zhongqiu, Mo Yuting, Li Lianbei, Yu Li, Zhu Peican

机构信息

School of Information, Renmin University of China, Beijing 100872, China.

School of Computer Science, Northeastern University, Shenyang 110819, China.

出版信息

Comput Intell Neurosci. 2016;2016:2093406. doi: 10.1155/2016/2093406. Epub 2016 Nov 29.

Abstract

Manual annotation of sentiment lexicons costs too much labor and time, and it is also difficult to get accurate quantification of emotional intensity. Besides, the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons (Wang et al., 2010). This paper implements statistical training for large-scale Chinese corpus through neural network language model and proposes an automatic method of constructing a multidimensional sentiment lexicon based on constraints of coordinate offset. In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts, we further present a sentiment disambiguation algorithm to increase the flexibility of our lexicon. Lastly, we present a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc. Experiments show the superior performance of SentiRuc lexicon in category labeling test, intensity labeling test, and sentiment classification tasks. It is worth mentioning that, in intensity label test, SentiRuc outperforms the second place by 21 percent.

摘要

情感词典的人工标注耗费过多人力和时间,而且难以对情感强度进行准确量化。此外,对某一特定领域的过度强调极大地限制了领域情感词典的适用性(Wang等人,2010年)。本文通过神经网络语言模型对大规模中文语料库进行统计训练,并提出一种基于坐标偏移约束构建多维情感词典的自动方法。为了区分那些在不同语境中可能表达正负两种含义的词语的情感极性,我们进一步提出一种情感消歧算法,以提高词典的灵活性。最后,我们提出一个全局优化框架,该框架提供了一种统一的方式来整合多种人工标注资源,用于学习我们的10维情感词典SentiRuc。实验表明,SentiRuc词典在类别标注测试、强度标注测试和情感分类任务中表现优异。值得一提的是,在强度标注测试中,SentiRuc比排名第二的高出21%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb44/5153545/a349ed101638/CIN2016-2093406.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验