Zhan Yongcheng, Liu Ruoran, Li Qiudan, Leischow Scott James, Zeng Daniel Dajun
Department of Management Information Systems, Eller College of Management, The University of Arizona, Tucson, AZ, United States.
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
J Med Internet Res. 2017 Jan 20;19(1):e24. doi: 10.2196/jmir.5780.
Electronic cigarette (e-cigarette) is an emerging product with a rapid-growth market in recent years. Social media has become an important platform for information seeking and sharing. We aim to mine hidden topics from e-cigarette datasets collected from different social media platforms.
This paper aims to gain a systematic understanding of the characteristics of various types of social media, which will provide deep insights into how consumers and policy makers effectively use social media to track e-cigarette-related content and adjust their decisions and policies.
We collected data from Reddit (27,638 e-cigarette flavor-related posts from January 1, 2011, to June 30, 2015), JuiceDB (14,433 e-juice reviews from June 26, 2013 to November 12, 2015), and Twitter (13,356 "e-cig ban"-related tweets from January, 1, 2010 to June 30, 2015). Latent Dirichlet Allocation, a generative model for topic modeling, was used to analyze the topics from these data.
We found four types of topics across the platforms: (1) promotions, (2) flavor discussions, (3) experience sharing, and (4) regulation debates. Promotions included sales from vendors to users, as well as trades among users. A total of 10.72% (2,962/27,638) of the posts from Reddit were related to trading. Promotion links were found between social media platforms. Most of the links (87.30%) in JuiceDB were related to Reddit posts. JuiceDB and Reddit identified consistent flavor categories. E-cigarette vaping methods and features such as steeping, throat hit, and vapor production were broadly discussed both on Reddit and on JuiceDB. Reddit provided space for policy discussions and majority of the posts (60.7%) holding a negative attitude toward regulations, whereas Twitter was used to launch campaigns using certain hashtags. Our findings are based on data across different platforms. The topic distribution between Reddit and JuiceDB was significantly different (P<.001), which indicated that the user discussions focused on different perspectives across the platforms.
This study examined Reddit, JuiceDB, and Twitter as social media data sources for e-cigarette research. These mined findings could be further used by other researchers and policy makers. By utilizing the automatic topic-modeling method, the proposed unified feedback model could be a useful tool for policy makers to comprehensively consider how to collect valuable feedback from social media.
电子烟是近年来市场迅速增长的新兴产品。社交媒体已成为信息搜索和分享的重要平台。我们旨在从不同社交媒体平台收集的电子烟数据集中挖掘隐藏主题。
本文旨在系统了解各类社交媒体的特征,这将为消费者和政策制定者如何有效利用社交媒体跟踪电子烟相关内容并调整其决策和政策提供深刻见解。
我们从Reddit(2011年1月1日至2015年6月30日期间27,638条与电子烟口味相关的帖子)、JuiceDB(2013年6月26日至2015年11月12日期间14,433条电子烟油评论)和Twitter(2010年1月1日至2015年6月30日期间13,356条与“电子烟禁令”相关的推文)收集数据。潜在狄利克雷分配(Latent Dirichlet Allocation),一种用于主题建模的生成模型,被用于分析这些数据中的主题。
我们在各平台发现了四类主题:(1)促销,(2)口味讨论,(3)经验分享,以及(4)监管辩论。促销包括商家向用户的销售以及用户之间的交易。Reddit上10.72%(2,962/27,638)的帖子与交易相关。在社交媒体平台之间发现了促销链接。JuiceDB中的大多数链接(87.30%)与Reddit帖子相关。JuiceDB和Reddit确定了一致的口味类别。电子烟的 vaping 方法以及诸如浸泡、击喉感和蒸汽产生等特征在Reddit和JuiceDB上都得到了广泛讨论。Reddit为政策讨论提供了空间,大多数帖子(60.7%)对监管持负面态度,而Twitter则用于使用特定主题标签发起活动。我们的发现基于不同平台的数据。Reddit和JuiceDB之间的主题分布存在显著差异(P<.001),这表明跨平台的用户讨论聚焦于不同视角。
本研究考察了Reddit、JuiceDB和Twitter作为电子烟研究的社交媒体数据源。其他研究人员和政策制定者可进一步利用这些挖掘出的结果。通过利用自动主题建模方法,所提出的统一反馈模型可能成为政策制定者全面考虑如何从社交媒体收集有价值反馈的有用工具。