Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany.
Bonn-Aachen International Center for IT, University of Bonn, Friedrich Hirzebruch-Allee 6, 53115, Bonn, Germany.
Sci Rep. 2023 Nov 27;13(1):20780. doi: 10.1038/s41598-023-48096-3.
The COVID-19 pandemic has pointed out the need for new technical approaches to increase the preparedness of healthcare systems. One important measure is to develop innovative early warning systems. Along those lines, we first compiled a corpus of relevant COVID-19 related symptoms with the help of a disease ontology, text mining and statistical analysis. Subsequently, we applied statistical and machine learning (ML) techniques to time series data of symptom related Google searches and tweets spanning the time period from March 2020 to June 2022. In conclusion, we found that a long-short-term memory (LSTM) jointly trained on COVID-19 symptoms related Google Trends and Twitter data was able to accurately forecast up-trends in classical surveillance data (confirmed cases and hospitalization rates) 14 days ahead. In both cases, F1 scores were above 98% and 97%, respectively, hence demonstrating the potential of using digital traces for building an early alert system for pandemics in Germany.
新冠疫情大流行凸显了医疗体系需要新的技术手段来增强应对能力。其中一个重要措施是开发创新的早期预警系统。为此,我们首先借助疾病本体、文本挖掘和统计分析,编制了一个与新冠相关症状的语料库。随后,我们应用统计和机器学习 (ML) 技术,对 2020 年 3 月至 2022 年 6 月期间的与症状相关的谷歌搜索和推文的时间序列数据进行了分析。最后,我们发现,基于 COVID-19 相关的谷歌趋势和推特数据联合训练的长短时记忆网络 (LSTM) ,能够提前 14 天准确预测经典监测数据(确诊病例和住院率)的上升趋势。在这两种情况下,F1 分数分别超过 98%和 97%,这表明利用数字痕迹来构建德国大流行病早期预警系统具有潜力。