Suppr超能文献

利用机器学习方法在美国利用 Twitter 预测潜在莱姆病病例和发病率。

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter.

机构信息

Harvard Extension School, Harvard University, Cambridge, USA.

Department of Social and Preventive Medicine, École de Santé Publique, University of Montreal, Montréal, Canada.

出版信息

BMC Med Inform Decis Mak. 2023 Oct 16;23(1):217. doi: 10.1186/s12911-023-02315-z.

Abstract

BACKGROUND

Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America.

OBJECTIVE

In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US.

METHODS

The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied.

RESULTS

Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of [Formula: see text]. There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time.

CONCLUSIONS

We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.

摘要

背景

莱姆病是美国最常见的传染病之一,占北美所有虫媒病的[Formula: see text]以上。

目的

本文通过分析推特上的自我报告推文,预测美国潜在的莱姆病病例,并准确评估美国的发病率。

方法

研究分三个阶段进行:(1)收集了大约 130 万条推文,并进行了预处理,以提取具有地理位置的最相关的莱姆病推文。使用一组精确的关键词,对部分推文进行半自动标记为与莱姆病相关或不相关,其余部分进行手动标记,从而得到一个经过精心整理的 77500 条推文的标记数据集。(2)使用这个标记数据集来训练、验证和测试各种自然语言处理(NLP)词嵌入方法和流行的机器学习(ML)分类模型的组合,如 TF-IDF 和逻辑回归、Word2vec 和 XGboost 以及 BERTweet 等,以识别潜在的莱姆病推文。(3)最后,研究了美国在 10 年内的时空模式。

结果

初步结果表明,BERTweet 在识别莱姆病推文方面优于所有测试的 NLP 分类器,其分类准确性和 F1 分数最高,分别为[Formula: see text]。还有一个一致的模式表明,美国的西部和东北部地区随着时间的推移,推文率更高。

结论

我们专注于使用推特数据作为美国莱姆病监测工具这一研究较少的问题。从研究中得出了几个重要的发现。首先,分类推文数量与莱姆病数量之间存在相当强的相关性,两者都呈现出相似的趋势。其次,在 2015 年和 2016 年初,像推特这样的社交媒体网络在提高公众对莱姆病的认识方面至关重要。第三,发病率高的县不一定与推文率高有关,反之亦然。第四,BERTweet 可以作为一种可靠的自然语言处理分类器,用于检测相关的莱姆病推文。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/874c80a73997/12911_2023_2315_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验