Tsapatsoulis Nicolas, Djouvas Constantinos
Image Retrieval and Collective Intelligence Lab, Department of Communication and Internet Studies, Cyprus University of Technology, Limassol, Cyprus.
Front Robot AI. 2019 Jan 22;5:138. doi: 10.3389/frobt.2018.00138. eCollection 2018.
The era of big data has, among others, three characteristics: the huge amounts of data created every day and in every form by everyday people, artificial intelligence tools to mine information from those data and effective algorithms that allow this data mining in real or close to real time. On the other hand, opinion mining in social media is nowadays an important parameter of social media marketing. Digital media giants such as Google and Facebook developed and employed their own tools for that purpose. These tools are based on publicly available software libraries and tools such as (or ) and , which emphasize topic modeling and extract low-level features using deep learning approaches. So far, researchers have focused their efforts on opinion mining and especially on sentiment analysis of tweets. This trend reflects the availability of the Twitter API that simplifies automatic data (tweet) collection and testing of the proposed algorithms in real situations. However, if we are really interested in realistic opinion mining we should consider mining opinions from social media platforms such as Facebook and Instagram, which are far more popular among everyday people. The basic purpose of this paper is to compare various kinds of low-level features, including those extracted through deep learning, as in and , and keywords suggested by the crowd, called herein, through a crowdsourcing platform. The application target is sentiment analysis of tweets and Facebook comments on commercial products. We also compare several machine learning methods for the creation of sentiment analysis models and conclude that, even in the era of big data, allowing people to annotate (a small portion of) data would allow effective artificial intelligence tools to be developed using the learning by example paradigm.
普通人每天以各种形式创建的海量数据、用于从这些数据中挖掘信息的人工智能工具,以及允许进行实时或接近实时数据挖掘的有效算法。另一方面,社交媒体中的意见挖掘如今是社交媒体营销的一个重要参数。谷歌和脸书等数字媒体巨头为此开发并使用了他们自己的工具。这些工具基于公开可用的软件库和工具,如(或)以及,它们强调主题建模并使用深度学习方法提取低级特征。到目前为止,研究人员一直致力于意见挖掘,尤其是推文的情感分析。这种趋势反映了推特应用程序编程接口(Twitter API)的可用性,它简化了自动数据(推文)收集以及在实际情况中对所提出算法的测试。然而,如果我们真的对现实的意见挖掘感兴趣,我们应该考虑从脸书和照片墙(Instagram)等社交媒体平台挖掘意见,这些平台在普通人中更受欢迎。本文的基本目的是通过众包平台比较各种低级特征,包括通过深度学习提取的特征(如和中所述)以及人群建议的关键词(本文称为)。应用目标是对商业产品的推文和脸书评论进行情感分析。我们还比较了几种用于创建情感分析模型的机器学习方法,并得出结论,即使在大数据时代,让人们对(一小部分)数据进行标注也将允许使用示例学习范式开发有效的人工智能工具。