Sarker Abeed, Ge Yao
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA.
JAMIA Open. 2021 Sep 2;4(3):ooab075. doi: 10.1093/jamiaopen/ooab075. eCollection 2021 Jul.
Our objective was to mine Reddit to discover long-COVID symptoms self-reported by users, compare symptom distributions across studies, and create a symptom lexicon. We retrieved posts from the subreddit and extracted symptoms via approximate matching using an expanded meta-lexicon. We mapped the extracted symptoms to standard concept IDs, compared their distributions with those reported in recent literature and analyzed their distributions over time. From 42 995 posts by 4249 users, we identified 1744 users who expressed at least 1 symptom. The most frequently reported long-COVID symptoms were (55.2%), (51.2%), (48.4%), (32.8%), and (28.9%) among users reporting at least 1 symptom. Comparison with recent literature revealed a large variance in reported symptoms across studies. Temporal analysis showed several persistent symptoms up to 15 months after infection. The spectrum of symptoms identified from Reddit may provide early insights about long-COVID.
我们的目标是挖掘Reddit,以发现用户自我报告的长期新冠症状,比较各项研究中的症状分布,并创建一个症状词汇表。我们从该子版块检索帖子,并使用扩展的元词汇表通过近似匹配提取症状。我们将提取的症状映射到标准概念ID,将其分布与近期文献报道的分布进行比较,并分析其随时间的分布情况。在4249名用户发布的42995条帖子中,我们识别出1744名表达了至少一种症状的用户。在报告了至少一种症状的用户中,最常报告的长期新冠症状是(55.2%)、(51.2%)、(48.4%)、(32.8%)和(28.9%)。与近期文献的比较显示,各项研究中报告的症状存在很大差异。时间分析显示,感染后长达15个月有几种持续存在的症状。从Reddit上识别出的症状谱可能为长期新冠提供早期见解。