Zhang Shitao
School of Network Communication, Zhejiang Yuexiu University, Shaoxing, China.
Front Psychol. 2021 Sep 28;12:758967. doi: 10.3389/fpsyg.2021.758967. eCollection 2021.
Text sentiment classification is a fundamental sub-area in natural language processing. The sentiment classification algorithm is highly domain-dependent. For example, the phrase "traffic jam" expresses negative sentiment in the sentence "I was stuck in a traffic jam on the elevated for 2 h." But in the domain of transportation, the phrase "traffic jam" in the sentence "Bread and water are essential terms in traffic jams" is without any sentiment. The most common method is to use the domain-specific data samples to classify the text in this domain. However, text sentiment analysis based on machine learning relies on sufficient labeled training data. Aiming at the problem of sentiment classification of news text data with insufficient label news data and the domain adaptation of text sentiment classifiers, an intelligent model, i.e., transfer learning discriminative dictionary learning algorithm (TLDDL) is proposed for cross-domain text sentiment classification. Based on the framework of dictionary learning, the samples from the different domains are projected into a subspace, and a domain-invariant dictionary is built to connect two different domains. To improve the discriminative performance of the proposed algorithm, the discrimination information preserved term and principal component analysis (PCA) term are combined into the objective function. The experiments are performed on three public text datasets. The experimental results show that the proposed algorithm improves the sentiment classification performance of texts in the target domain.
文本情感分类是自然语言处理中的一个基本子领域。情感分类算法高度依赖于领域。例如,短语“交通堵塞”在句子“我在高架桥上堵了两个小时”中表达负面情绪。但在交通领域,句子“面包和水是交通堵塞中的必备物品”中的短语“交通堵塞”没有任何情感倾向。最常见的方法是使用特定领域的数据样本对该领域的文本进行分类。然而,基于机器学习的文本情感分析依赖于足够的标注训练数据。针对新闻文本数据标注不足以及文本情感分类器的领域适应性问题,提出了一种智能模型,即用于跨领域文本情感分类的迁移学习判别字典学习算法(TLDDL)。基于字典学习框架,将来自不同领域的样本投影到一个子空间中,并构建一个领域不变字典来连接两个不同领域。为了提高所提算法的判别性能,将判别信息保留项和主成分分析(PCA)项组合到目标函数中。在三个公开文本数据集上进行了实验。实验结果表明,所提算法提高了目标领域中文本的情感分类性能。