Ntompras Charalampos, Drosatos George, Kaldoudi Eleni
School of Medicine, Democritus University of Thrace, Alexandroupoli, Greece.
Institute for Language and Speech Processing, Athena Research Center, Xanthi, Greece.
J Comput Soc Sci. 2022;5(1):687-729. doi: 10.1007/s42001-021-00150-8. Epub 2021 Oct 20.
The COVID-19 pandemic has deeply impacted all aspects of social, professional, and financial life, with concerns and responses being readily published in online social media worldwide. This study employs probabilistic text mining techniques for a large-scale, high-resolution, temporal, and geospatial content analysis of Twitter related discussions. Analysis considered 20,230,833 English language original COVID-19-related tweets with global origin retrieved between January 25, 2020 and April 30, 2020. Fine grain topic analysis identified 91 meaningful topics. Most of the topics showed a temporal evolution with local maxima, underlining the short-lived character of discussions in Twitter. When compared to real-world events, temporal popularity curves showed a good correlation with and quick response to real-world triggers. Geospatial analysis of topics showed that approximately 30% of original English language tweets were contributed by USA-based users, while overall more than 60% of the English language tweets were contributed by users from countries with an official language other than English. High-resolution temporal and geospatial analysis of Twitter content shows potential for political, economic, and social monitoring on a global and national level.
新冠疫情对社会、职业和金融生活的各个方面都产生了深远影响,全球在线社交媒体上也纷纷发表了相关的担忧和应对措施。本研究采用概率文本挖掘技术,对推特上相关讨论进行大规模、高分辨率、时间和地理空间的内容分析。分析对象为2020年1月25日至2020年4月30日期间检索到的20,230,833条来自全球的与新冠疫情相关的英文原创推文。细粒度主题分析确定了91个有意义的主题。大多数主题呈现出随时间演变且有局部峰值的情况,凸显了推特讨论的短暂性。与现实世界事件相比,时间流行度曲线与现实世界的触发因素具有良好的相关性且反应迅速。主题的地理空间分析表明,约30%的英文原创推文由美国用户发布,而总体上超过60%的英文推文由母语非英语国家的用户发布。对推特内容进行高分辨率的时间和地理空间分析,显示出在全球和国家层面进行政治、经济和社会监测的潜力。