Suppr超能文献

通过主题建模识别客观词和主观词。

Identifying Objective and Subjective Words via Topic Modeling.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Mar;29(3):718-730. doi: 10.1109/TNNLS.2016.2626379. Epub 2017 Jan 17.

Abstract

It is observed that distinct words in a given document have either strong or weak ability in delivering facts (i.e., the objective sense) or expressing opinions (i.e., the subjective sense) depending on the topics they associate with. Motivated by the intuitive assumption that different words have varying degree of discriminative power in delivering the objective sense or the subjective sense with respect to their assigned topics, a model named as dentified bjective- ubjective latent Dirichlet allocation (LDA) ( osLDA) is proposed in this paper. In the osLDA model, the simple Pólya urn model adopted in traditional topic models is modified by incorporating it with a probabilistic generative process, in which the novel "Bag-of-Discriminative-Words" (BoDW) representation for the documents is obtained; each document has two different BoDW representations with regard to objective and subjective senses, respectively, which are employed in the joint objective and subjective classification instead of the traditional Bag-of-Topics representation. The experiments reported on documents and images demonstrate that: 1) the BoDW representation is more predictive than the traditional ones; 2) osLDA boosts the performance of topic modeling via the joint discovery of latent topics and the different objective and subjective power hidden in every word; and 3) osLDA has lower computational complexity than supervised LDA, especially under an increasing number of topics.

摘要

可以观察到,给定文档中的不同单词根据它们所关联的主题,具有较强或较弱的传递事实(即客观意义)或表达观点(即主观意义)的能力。受不同单词在传达客观意义或主观意义方面相对于其指定主题具有不同程度的区分能力的直观假设的启发,本文提出了一种名为“识别的客观-主观潜在狄利克雷分配(LDA)(osLDA)”的模型。在 osLDA 模型中,传统主题模型中采用的简单 Pólya urn 模型通过将其与概率生成过程相结合进行了修改,从而获得了文档的新颖的“判别词袋”(BoDW)表示;每个文档具有关于客观和主观意义的两个不同的 BoDW 表示,分别用于联合客观和主观分类,而不是传统的“话题袋”表示。针对文档和图像进行的实验表明:1)BoDW 表示比传统表示更具预测性;2)osLDA 通过对潜在主题的联合发现以及每个单词中隐藏的不同客观和主观能力,提高了主题建模的性能;3)osLDA 的计算复杂度低于监督 LDA,尤其是在主题数量增加的情况下。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验