Department of Industrial & Systems Engineering, Dongguk University, Jung-gu, Seoul, South Korea.
PLoS One. 2024 Sep 4;19(9):e0309842. doi: 10.1371/journal.pone.0309842. eCollection 2024.
As the influence and risk of infectious diseases increase, efforts are being made to predict the number of confirmed infectious disease patients, but research involving the qualitative opinions of social media users is scarce. However, social data can change the psychology and behaviors of crowds through information dissemination, which can affect the spread of infectious diseases. Existing studies have used the number of confirmed cases and spatial data to predict the number of confirmed cases of infectious diseases. However, studies using opinions from social data that affect changes in human behavior in relation to the spread of infectious diseases are inadequate. Therefore, herein, we propose a new approach for sentiment analysis of social data by using opinion mining and to predict the number of confirmed cases of infectious diseases by using machine learning techniques. To build a sentiment dictionary specialized for predicting infectious diseases, we used Word2Vec to expand the existing sentiment dictionary and calculate the daily sentiment polarity by dividing it into positive and negative polarities from collected social data. Thereafter, we developed an algorithm to predict the number of confirmed infectious patients by using both positive and negative polarities with DNN, LSTM and GRU. The method proposed herein showed that the prediction results of the number of confirmed cases obtained using opinion mining were 1.12% and 3% better than those obtained without using opinion mining in LSTM and GRU model, and it is expected that social data will be used from a qualitative perspective for predicting the number of confirmed cases of infectious diseases.
随着传染病的影响和风险增加,人们正在努力预测确诊传染病患者的数量,但涉及社交媒体用户定性意见的研究很少。然而,社会数据可以通过信息传播改变人群的心理和行为,从而影响传染病的传播。现有的研究已经使用确诊病例数和空间数据来预测传染病的确诊病例数。但是,利用影响与传染病传播有关的人类行为变化的社会数据中的意见的研究还不够充分。因此,本文提出了一种利用意见挖掘对社会数据进行情感分析的新方法,并利用机器学习技术预测传染病的确诊病例数。为了构建专门用于预测传染病的情感词典,我们使用 Word2Vec 来扩展现有的情感词典,并从收集到的社会数据中按正、负极性将其分为正、负极性,以计算每日情感极性。此后,我们开发了一种使用 DNN、LSTM 和 GRU 的算法,通过正、负极性来预测确诊传染病患者的数量。本文提出的方法表明,在 LSTM 和 GRU 模型中,使用意见挖掘获得的确诊病例数预测结果比不使用意见挖掘时分别提高了 1.12%和 3%,预计将从定性角度利用社会数据来预测传染病的确诊病例数。