Suppr超能文献

一种应用于社交媒体中人格预测的多标签半监督分类方法。

A multi-label, semi-supervised classification approach applied to personality prediction in social media.

作者信息

Lima Ana Carolina E S, de Castro Leandro Nunes

机构信息

Natural Computing Laboratory, Mackenzie Presbyterian University, São Paulo, Brazil.

出版信息

Neural Netw. 2014 Oct;58:122-30. doi: 10.1016/j.neunet.2014.05.020. Epub 2014 Jun 11.

Abstract

Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others.

摘要

社交媒体允许网络用户创建和分享与不同主题相关的内容,展示他们的活动、观点、感受和想法。在这种背景下,在线社交媒体吸引了数据科学家的兴趣,他们试图了解行为和趋势,同时为社交网站收集统计数据。这些数据的一个潜在应用是个性预测,其目的是了解用户在社交媒体中的行为。传统的个性预测依赖于用户的个人资料、状态更新、发布的消息等。在此,介绍一种用于社交媒体数据的个性预测系统,它与文献中的大多数方法不同,该系统处理的是文本组,而非单个文本,并且不考虑用户的个人资料。此外,所提出的方法从文本中提取元属性,而不是直接处理消息的内容。可能的个性特征集取自大五人格模型,并将该问题表征为多标签分类任务。由于难以标注社交媒体中生成的大量数据,该问题随后被转化为一组五个二元分类问题,并通过半监督学习方法解决。在我们的实现中,所提出的系统使用三种著名的机器学习算法进行训练,即朴素贝叶斯分类器、支持向量机和多层感知器神经网络。该系统被应用于预测从文献中可用的三个数据集中获取的推文的个性,预测准确率约为83%,其中一些个性特征的个体分类率比其他特征更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验