• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用机器学习方法在美国利用 Twitter 预测潜在莱姆病病例和发病率。

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter.

机构信息

Harvard Extension School, Harvard University, Cambridge, USA.

Department of Social and Preventive Medicine, École de Santé Publique, University of Montreal, Montréal, Canada.

出版信息

BMC Med Inform Decis Mak. 2023 Oct 16;23(1):217. doi: 10.1186/s12911-023-02315-z.

DOI:10.1186/s12911-023-02315-z
PMID:37845666
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10578027/
Abstract

BACKGROUND

Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America.

OBJECTIVE

In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US.

METHODS

The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied.

RESULTS

Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of [Formula: see text]. There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time.

CONCLUSIONS

We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.

摘要

背景

莱姆病是美国最常见的传染病之一,占北美所有虫媒病的[Formula: see text]以上。

目的

本文通过分析推特上的自我报告推文,预测美国潜在的莱姆病病例,并准确评估美国的发病率。

方法

研究分三个阶段进行:(1)收集了大约 130 万条推文,并进行了预处理,以提取具有地理位置的最相关的莱姆病推文。使用一组精确的关键词,对部分推文进行半自动标记为与莱姆病相关或不相关,其余部分进行手动标记,从而得到一个经过精心整理的 77500 条推文的标记数据集。(2)使用这个标记数据集来训练、验证和测试各种自然语言处理(NLP)词嵌入方法和流行的机器学习(ML)分类模型的组合,如 TF-IDF 和逻辑回归、Word2vec 和 XGboost 以及 BERTweet 等,以识别潜在的莱姆病推文。(3)最后,研究了美国在 10 年内的时空模式。

结果

初步结果表明,BERTweet 在识别莱姆病推文方面优于所有测试的 NLP 分类器,其分类准确性和 F1 分数最高,分别为[Formula: see text]。还有一个一致的模式表明,美国的西部和东北部地区随着时间的推移,推文率更高。

结论

我们专注于使用推特数据作为美国莱姆病监测工具这一研究较少的问题。从研究中得出了几个重要的发现。首先,分类推文数量与莱姆病数量之间存在相当强的相关性,两者都呈现出相似的趋势。其次,在 2015 年和 2016 年初,像推特这样的社交媒体网络在提高公众对莱姆病的认识方面至关重要。第三,发病率高的县不一定与推文率高有关,反之亦然。第四,BERTweet 可以作为一种可靠的自然语言处理分类器,用于检测相关的莱姆病推文。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/a3da5062534b/12911_2023_2315_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/874c80a73997/12911_2023_2315_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/3596b1b83850/12911_2023_2315_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/050dc94c943d/12911_2023_2315_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/a3da5062534b/12911_2023_2315_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/874c80a73997/12911_2023_2315_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/3596b1b83850/12911_2023_2315_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/050dc94c943d/12911_2023_2315_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56a2/10578027/a3da5062534b/12911_2023_2315_Fig4_HTML.jpg

相似文献

1
Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter.利用机器学习方法在美国利用 Twitter 预测潜在莱姆病病例和发病率。
BMC Med Inform Decis Mak. 2023 Oct 16;23(1):217. doi: 10.1186/s12911-023-02315-z.
2
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
3
Mapping tweets to a known disease epidemiology; a case study of Lyme disease in the United Kingdom and Republic of Ireland.将推文映射到已知疾病流行病学;英国和爱尔兰共和国莱姆病的案例研究。
J Biomed Inform. 2019;100S:100060. doi: 10.1016/j.yjbinx.2019.100060. Epub 2019 Oct 18.
4
Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.用于追踪 COVID-19 的 Twitter:自然语言处理管道和探索性数据集。
J Med Internet Res. 2021 Jan 22;23(1):e25314. doi: 10.2196/25314.
5
Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.机器学习分类器在电子烟 Twitter 监测中的应用:比较机器学习研究。
J Med Internet Res. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478.
6
A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes.一种自然语言处理流程,以促进将推特数据用于不良妊娠结局的数字流行病学研究。
J Biomed Inform. 2020;112S:100076. doi: 10.1016/j.yjbinx.2020.100076. Epub 2020 Aug 8.
7
Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter.社交媒体挖掘在出生缺陷研究中的应用:一种基于规则和自举的方法,用于在 Twitter 上收集罕见健康相关事件的数据。
J Biomed Inform. 2018 Nov;87:68-78. doi: 10.1016/j.jbi.2018.10.001. Epub 2018 Oct 4.
8
Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study.使用BERTweet对推特上的电子烟话语进行分类:比较深度学习研究。
JMIR Med Inform. 2022 Jul 21;10(7):e33678. doi: 10.2196/33678.
9
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.在 Twitter 上检测潜在有害和保护自杀相关内容:机器学习方法。
J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705.
10
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.

引用本文的文献

1
Different environmental factors predict the occurrence of tick-borne encephalitis virus (TBEV) and reveal new potential risk areas across Europe via geospatial models.不同的环境因素可预测蜱传脑炎病毒(TBEV)的发生,并通过地理空间模型揭示欧洲各地新的潜在风险区域。
Int J Health Geogr. 2025 Mar 14;24(1):3. doi: 10.1186/s12942-025-00388-9.
2
Application of large language models in disease diagnosis and treatment.大语言模型在疾病诊断与治疗中的应用。
Chin Med J (Engl). 2025 Jan 20;138(2):130-142. doi: 10.1097/CM9.0000000000003456. Epub 2024 Dec 26.
3
Identifying the geographic leading edge of Lyme disease in the United States with internet searches: A spatiotemporal analysis of Google Health Trends data.

本文引用的文献

1
Economic Burden of Reported Lyme Disease in High-Incidence Areas, United States, 2014-2016.2014-2016 年美国高发地区报告莱姆病的经济负担。
Emerg Infect Dis. 2022 Jun;28(6):1170-1179. doi: 10.3201/eid2806.211335.
2
An Exploratory Study on the Microbiome of Northern and Southern Populations of Ticks Predicts Changes and Unique Bacterial Interactions.一项关于蜱虫南北种群微生物组的探索性研究预测了变化及独特的细菌相互作用。
Pathogens. 2022 Jan 21;11(2):130. doi: 10.3390/pathogens11020130.
3
Mistaken Identity: Many Diagnoses are Frequently Misattributed to Lyme Disease.
利用互联网搜索确定美国莱姆病的地理前沿:谷歌健康趋势数据的时空分析。
PLoS One. 2024 Nov 13;19(11):e0312277. doi: 10.1371/journal.pone.0312277. eCollection 2024.
4
Lyme rashes disease classification using deep feature fusion technique.利用深度特征融合技术对莱姆皮疹病进行分类。
Skin Res Technol. 2023 Nov;29(11):e13519. doi: 10.1111/srt.13519.
5
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
误诊:许多诊断结果常归因于莱姆病。
Am J Med. 2022 Apr;135(4):503-511.e5. doi: 10.1016/j.amjmed.2021.10.040. Epub 2021 Nov 30.
4
Mapping tweets to a known disease epidemiology; a case study of Lyme disease in the United Kingdom and Republic of Ireland.将推文映射到已知疾病流行病学;英国和爱尔兰共和国莱姆病的案例研究。
J Biomed Inform. 2019;100S:100060. doi: 10.1016/j.yjbinx.2019.100060. Epub 2019 Oct 18.
5
Comparison of Lyme Disease in the United States and Europe.美国与欧洲莱姆病的比较。
Emerg Infect Dis. 2021 Aug;27(8):2017-2024. doi: 10.3201/eid2708.204763.
6
Dilution and amplification effects in Lyme disease: Modeling the effects of reservoir-incompetent hosts on Borrelia burgdorferi sensu stricto transmission.莱姆病的稀释和放大效应:建模非宿主传染性宿主对伯氏疏螺旋体(Borrelia burgdorferi sensu stricto)传播的影响。
Ticks Tick Borne Dis. 2021 Jul;12(4):101724. doi: 10.1016/j.ttbdis.2021.101724. Epub 2021 Apr 13.
7
Environmental Correlates of Lyme Disease Emergence in Southwest Virginia, 2005-2014.2005 - 2014年弗吉尼亚西南部莱姆病出现的环境相关因素
J Med Entomol. 2021 Jul 16;58(4):1680-1685. doi: 10.1093/jme/tjab038.
8
Likely Geographic Distributional Shifts among Medically Important Tick Species and Tick-Associated Diseases under Climate Change in North America: A Review.北美洲气候变化下重要医学蜱种及蜱媒疾病可能的地理分布变化:综述
Insects. 2021 Mar 5;12(3):225. doi: 10.3390/insects12030225.
9
The presenting characteristics of erythema migrans vary by age, sex, duration, and body location.游走性红斑的表现特征因年龄、性别、病程和身体部位而异。
Infection. 2021 Aug;49(4):685-692. doi: 10.1007/s15010-021-01590-0. Epub 2021 Mar 7.
10
Use of Commercial Claims Data for Evaluating Trends in Lyme Disease Diagnoses, United States, 2010-2018.利用商业索赔数据评估美国莱姆病诊断趋势,2010-2018 年。
Emerg Infect Dis. 2021;27(2):499-507. doi: 10.3201/eid2702.202728.