Department of Medicine and Surgery, University of Milano-Bicocca, 20126 Milan, Italy.
Department of Informatics, Systems, and Communication. University of Milano-Bicocca, 20126 Milan, Italy.
Int J Environ Res Public Health. 2020 Feb 26;17(5):1510. doi: 10.3390/ijerph17051510.
Binge Drinking (BD) is a common risky behaviour that people hardly report to healthcare professionals, although it is not uncommon to find, instead, personal communications related to alcohol-related behaviors on social media. By following a data-driven approach focusing on User-Generated Content, we aimed to detect potential binge drinkers through the investigation of their language and shared topics. First, we gathered Twitter threads quoting BD and alcohol-related behaviours, by considering unequivocal keywords, identified by experts, from previous evidence on BD. Subsequently, a random sample of the gathered tweets was manually labelled, and two supervised learning classifiers were trained on both linguistic and metadata features, to classify tweets of genuine unique users with respect to media, bot, and commercial accounts. Based on this classification, we observed that approximately 55% of the 1 million alcohol-related collected tweets was automatically identified as belonging to non-genuine users. A third classifier was then trained on a subset of manually labelled tweets among those previously identified as belonging to genuine accounts, to automatically identify potential binge drinkers based only on linguistic features. On average, users classified as binge drinkers were quite similar to the standard genuine Twitter users in our sample. Nonetheless, the analysis of social media contents of genuine users reporting risky behaviours remains a promising source for informed preventive programs.
binge 饮酒(BD)是一种常见的危险行为,人们几乎不会向医疗保健专业人员报告,尽管在社交媒体上找到与酒精相关行为的个人交流并不罕见。通过采用关注用户生成内容的数据驱动方法,我们旨在通过调查他们的语言和共享主题来发现潜在的 binge 饮酒者。首先,我们通过考虑专家从之前关于 binge 饮酒的证据中确定的明确关键词,收集引用了 binge 饮酒和与酒精相关行为的 Twitter 线程。随后,对收集到的推文的随机样本进行了手动标记,并基于语言和元数据特征对两个监督学习分类器进行了训练,以对媒体、机器人和商业账户进行分类,以识别真实独特用户的推文。根据这种分类,我们观察到在收集到的 100 万条与酒精相关的推文中,大约有 55%被自动识别为不属于真实用户。然后,在之前被确定为属于真实账户的手动标记推文的子集上训练了第三个分类器,以便仅基于语言特征自动识别潜在的 binge 饮酒者。平均而言,被归类为 binge 饮酒者的用户与我们样本中的标准真实 Twitter 用户非常相似。尽管如此,对报告危险行为的真实用户的社交媒体内容进行分析仍然是一个有前途的信息预防计划来源。