Ong Song-Quan, Ahmad Hamdan
Institute of Tropical and Conservation, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.
Department of Ecoscience and Arctic Research Centre, Aarhus University, Aarhus, Denmark.
PeerJ. 2024 Mar 1;12:e17045. doi: 10.7717/peerj.17045. eCollection 2024.
Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.
蚊媒疾病是全球面临的重大威胁,针对这些疾病的公众咨询对于疾病控制决策至关重要。然而,传统的公众调查既耗时又费力,且无法实现及时决策。最近的研究探索了文本分析方法,以从社交媒体中获取公众对公共卫生的评论。因此,本研究旨在展示一种文本分析流程,以识别在推特上讨论并对公众舆论产生重大影响的蚊媒疾病主题。从推特上总共检索到25,000条推文,使用潜在狄利克雷分配(LDA)对主题进行建模,并使用情感分析器(VADER)模型计算情感极性。经过数据清理后,我们总共获得了6,243条推文,我们能够使用特征选择算法对其进行处理。使用博鲁塔(Boruta)作为特征选择算法来确定主题对公众舆论的重要性。使用多项逻辑回归(MLR)性能和专家判断对结果进行验证。潜在狄利克雷分配(LDA)和专家都识别出了诸如繁殖地、蚊虫控制、影响/资金、一年中的时间、具有相似症状的其他疾病、蚊虫与人类的相互作用以及诊断生物标志物等重要问题。多项逻辑回归(MLR)结果表明,套索(LASSO)选择的主题比其他算法表现得明显更好,并且专家在讨论中进一步证实了这些主题。