• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EventEpi-一个基于自然语言处理的事件监测框架。

EventEpi-A natural language processing framework for event-based surveillance.

机构信息

Robert Koch Institute (RKI), Berlin, Germany.

Osnabrück University, Osnabrück, Lower Saxony, Germany.

出版信息

PLoS Comput Biol. 2020 Nov 20;16(11):e1008277. doi: 10.1371/journal.pcbi.1008277. eCollection 2020 Nov.

DOI:10.1371/journal.pcbi.1008277
PMID:33216746
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7717563/
Abstract

According to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of public health agents sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural language processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at the RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles' key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We extracted the key country and disease using a heuristic with good results. We trained a naive Bayes classifier to find the key date and confirmed-case count, using the RKI's EBS database as labels which performed modestly. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using bag-of-words, document and word embeddings. The best classifier, a logistic regression, achieved a sensitivity of 0.82 and an index balanced accuracy of 0.61. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code and data are publicly available under open licenses.

摘要

根据世界卫生组织(WHO)的数据,约有 60%的疫情暴发是通过非正规渠道发现的。在许多公共卫生机构,包括世界卫生组织和罗伯特·科赫研究所(RKI),都有专门的公共卫生人员小组筛选大量文章和通讯,以发现相关事件。这种媒体筛选是基于事件的监测(EBS)的重要组成部分。阅读文章、讨论其相关性,并将关键信息输入数据库是一个耗时的过程。为了支持 EBS,也为了深入了解是什么使一篇文章及其描述的事件具有相关性,我们开发了一个用于自动信息提取和相关性评分的自然语言处理框架。首先,我们从 RKI(WHO 疾病暴发新闻和 ProMED)等 EBS 相关来源中抓取相关文章,并自动提取文章的关键数据:疾病、国家、日期和确诊病例数。为此,我们分两步进行命名实体识别:EpiTator,一个开源的流行病学注释工具,为每个实体都提供了许多不同的可能性。我们使用一个启发式方法提取关键国家和疾病,效果很好。我们使用 RKI 的 EBS 数据库作为标签来训练朴素贝叶斯分类器,以找到关键日期和确诊病例数,结果尚可。然后,对于相关性评分,我们定义了两个类别,任何文章都可能属于这两个类别:如果文章在 EBS 数据库中,则相关,否则不相关。我们使用词袋、文档和词向量比较了不同分类器的性能。表现最好的分类器是逻辑回归,其灵敏度为 0.82,平衡准确率为 0.61。最后,我们将这些功能集成到一个名为 EventEpi 的网络应用程序中,该应用程序可以自动分析相关来源并将其放入数据库中。用户也可以提供任何 URL 或文本,这些内容将以相同的方式进行分析并添加到数据库中。这些步骤中的每一步都可以改进,特别是使用更大的标记数据集和对学习算法进行微调。然而,整个框架已经运行良好,可以在生产中使用,有望改进 EBS。源代码和数据在开放许可证下公开。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/761fa5a06267/pcbi.1008277.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/9ac037b83962/pcbi.1008277.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/30ced7a3f230/pcbi.1008277.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/761fa5a06267/pcbi.1008277.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/9ac037b83962/pcbi.1008277.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/30ced7a3f230/pcbi.1008277.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dede/7717563/761fa5a06267/pcbi.1008277.g003.jpg

相似文献

1
EventEpi-A natural language processing framework for event-based surveillance.EventEpi-一个基于自然语言处理的事件监测框架。
PLoS Comput Biol. 2020 Nov 20;16(11):e1008277. doi: 10.1371/journal.pcbi.1008277. eCollection 2020 Nov.
2
Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus.数字监控在拉丁美洲疾病爆发中的应用:从新型西班牙语语料库中提取信息。
BMC Bioinformatics. 2022 Dec 23;23(1):558. doi: 10.1186/s12859-022-05094-y.
3
Global Variations in Event-Based Surveillance for Disease Outbreak Detection: Time Series Analysis.基于事件的疾病暴发检测全球变化:时间序列分析。
JMIR Public Health Surveill. 2022 Oct 31;8(10):e36211. doi: 10.2196/36211.
4
A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。
J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.
5
A methodology to enhance spatial understanding of disease outbreak events reported in news articles.一种增强对新闻文章中报道的疾病暴发事件的空间理解的方法。
Int J Med Inform. 2010 Apr;79(4):284-96. doi: 10.1016/j.ijmedinf.2010.01.014. Epub 2010 Feb 13.
6
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
7
Application of natural language processing algorithms for extracting information from news articles in event-based surveillance.基于事件监测的新闻文章信息提取中自然语言处理算法的应用。
Can Commun Dis Rep. 2020 Jun 4;46(6):186-191. doi: 10.14745/ccdr.46i06a06.
8
Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.通用和特定词嵌入在研究转化阶段分类中的效用
AMIA Annu Symp Proc. 2018 Dec 5;2018:1405-1414. eCollection 2018.
9
Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.利用自然语言处理、机器学习和人类专业知识开发全球传染病活动数据库。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.
10
An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.基于文本的突发传染病网络监测研究框架的探索性研究
Int J Med Inform. 2011 Jan;80(1):56-66. doi: 10.1016/j.ijmedinf.2010.10.015. Epub 2010 Dec 4.

引用本文的文献

1
An epidemiological knowledge graph extracted from the World Health Organization's Disease Outbreak News.从世界卫生组织疾病暴发新闻中提取的流行病学知识图谱。
Sci Data. 2025 Jun 10;12(1):970. doi: 10.1038/s41597-025-05276-2.
2
Efficient screening of pharmacological broad-spectrum anti-cancer peptides utilizing advanced bidirectional Encoder representation from Transformers strategy.利用先进的基于变换器的双向编码器表征策略高效筛选药理广谱抗癌肽。
Heliyon. 2024 May 1;10(9):e30373. doi: 10.1016/j.heliyon.2024.e30373. eCollection 2024 May 15.
3
Leveraging electronic data to expand infection detection beyond traditional settings and definitions (Part II/III).

本文引用的文献

1
"What is relevant in a text document?": An interpretable machine learning approach.“文本文档中的相关内容是什么?”:一种可解释的机器学习方法。
PLoS One. 2017 Aug 11;12(8):e0181142. doi: 10.1371/journal.pone.0181142. eCollection 2017.
2
ProMED-mail: 22 years of digital surveillance of emerging infectious diseases.国际传染病监测预警组织(ProMED-mail):22年的新发传染病数字监测。
Int Health. 2017 May 1;9(3):177-183. doi: 10.1093/inthealth/ihx014.
3
Effect of temperature and precipitation on salmonellosis cases in South-East Queensland, Australia: an observational study.
利用电子数据扩展感染检测范围,超越传统环境和定义(第二部分/第三部分)
Antimicrob Steward Healthc Epidemiol. 2023 Feb 10;3(1):e27. doi: 10.1017/ash.2022.342. eCollection 2023.
4
Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus.数字监控在拉丁美洲疾病爆发中的应用:从新型西班牙语语料库中提取信息。
BMC Bioinformatics. 2022 Dec 23;23(1):558. doi: 10.1186/s12859-022-05094-y.
5
Usage of social media in epidemic intelligence activities in the WHO, Regional Office for the Eastern Mediterranean.社交媒体在世界卫生组织东地中海区域办事处疫情情报活动中的使用情况。
BMJ Glob Health. 2022 Jun;7(Suppl 4). doi: 10.1136/bmjgh-2022-008759.
6
Machine and cognitive intelligence for human health: systematic review.用于人类健康的机器与认知智能:系统综述
Brain Inform. 2022 Feb 12;9(1):5. doi: 10.1186/s40708-022-00153-9.
7
Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing.2020 年(含新冠疫情):临床自然语言处理相关科学文献观察
Yearb Med Inform. 2021 Aug;30(1):257-263. doi: 10.1055/s-0041-1726528. Epub 2021 Sep 3.
8
Using digital surveillance tools for near real-time mapping of the risk of infectious disease spread.使用数字监测工具对传染病传播风险进行近实时映射。
NPJ Digit Med. 2021 Apr 16;4(1):73. doi: 10.1038/s41746-021-00442-3.
9
Challenges and opportunities for public health made possible by advances in natural language processing.自然语言处理进展为公共卫生带来的挑战与机遇。
Can Commun Dis Rep. 2020 Jun 4;46(6):161-168. doi: 10.14745/ccdr.v46i06a02.
温度和降水对澳大利亚昆士兰州东南部沙门氏菌病病例的影响:一项观察性研究。
BMJ Open. 2016 Feb 25;6(2):e010204. doi: 10.1136/bmjopen-2015-010204.
4
The Impact of Water, Sanitation and Hygiene Interventions to Control Cholera: A Systematic Review.水、环境卫生和个人卫生干预措施对控制霍乱的影响:一项系统评价
PLoS One. 2015 Aug 18;10(8):e0135676. doi: 10.1371/journal.pone.0135676. eCollection 2015.
5
Internet surveillance systems for early alerting of health threats.用于早期预警健康威胁的互联网监测系统。
Euro Surveill. 2009 Apr 2;14(13):19162.
6
What is epidemic intelligence, and how is it being improved in Europe?什么是流行病情报,它在欧洲是如何得到改善的?
Euro Surveill. 2006 Feb 2;11(2):E060202.4. doi: 10.2807/esw.11.05.02892-en.