• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于文本的突发传染病网络监测研究框架的探索性研究

An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.

机构信息

The ISIS Center, Georgetown University Medical Center, Washington, DC, USA.

出版信息

Int J Med Inform. 2011 Jan;80(1):56-66. doi: 10.1016/j.ijmedinf.2010.10.015. Epub 2010 Dec 4.

DOI:10.1016/j.ijmedinf.2010.10.015
PMID:21134784
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3904285/
Abstract

PURPOSE

Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles.

METHODS

Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers.

RESULTS

Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period.

CONCLUSIONS

The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages.

摘要

目的

传染病爆发的早期检测对于保护社会公众健康至关重要。在线新闻文章提供了全球疾病爆发的及时信息。在本研究中,我们使用机器学习分类器研究了自动检测与疾病爆发相关的文章。在实际情况下,为分类器准备训练数据集是昂贵的,该数据集通常由手动标记的相关和不相关文章组成。为了缓解这一挑战,我们研究了使用随机采样的未标记文章以及标记的相关文章。

方法

朴素贝叶斯和支持向量机(SVM)分类器在 149 篇相关文章和 149 篇或更多随机采样的未标记文章上进行了训练。通过改变采样的未标记文章数量和词特征数量,对不同的分类器进行了训练。将训练好的分类器应用于 15000 篇在 15 天内发布的文章。从每个分类器中排名最高的文章中进行汇总,并由专家分析师对结果进行审查,以评估分类器。

结果

在 15 天的评估期间,朴素贝叶斯和 SVM 分类器的 ROC 曲线下面积(AUC)的日平均值分别为 0.841 和 0.836。我们参考了疾病爆发报告数据库,以确认该评估数据集确实是通过汇总方法从评估期间数据库中记录的事件中得出的。

结论

该方法利用随机采样的未标记文章提出的文本分类框架可以在基于互联网的生物监测项目中为训练机器学习分类器提供一种具有成本效益的方法。我们计划使用更大的数据集和非英语语言的文章进一步研究该框架。

相似文献

1
An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.基于文本的突发传染病网络监测研究框架的探索性研究
Int J Med Inform. 2011 Jan;80(1):56-66. doi: 10.1016/j.ijmedinf.2010.10.015. Epub 2010 Dec 4.
2
HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports.健康地图:通过对互联网媒体报道进行自动分类和可视化来实现全球传染病监测。
J Am Med Inform Assoc. 2008 Mar-Apr;15(2):150-7. doi: 10.1197/jamia.M2544. Epub 2007 Dec 20.
3
Use of unstructured event-based reports for global infectious disease surveillance.使用基于非结构化事件的报告进行全球传染病监测。
Emerg Infect Dis. 2009 May;15(5):689-95. doi: 10.3201/eid1505.081114.
4
Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.利用自然语言处理、机器学习和人类专业知识开发全球传染病活动数据库。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.
5
Biosurveillance, classification, and semantic health technologies.生物监测、分类及语义健康技术。
J Am Med Inform Assoc. 2008 Mar-Apr;15(2):172-3. doi: 10.1197/jamia.m2693.
6
Infectious disease outbreak prediction using media articles with machine learning models.利用带有机器学习模型的媒体文章进行传染病爆发预测。
Sci Rep. 2021 Feb 24;11(1):4413. doi: 10.1038/s41598-021-83926-2.
7
An automated, broad-based, near real-time public health surveillance system using presentations to hospital Emergency Departments in New South Wales, Australia.一个利用澳大利亚新南威尔士州医院急诊科就诊情况的自动化、广泛且近乎实时的公共卫生监测系统。
BMC Public Health. 2005 Dec 22;5:141. doi: 10.1186/1471-2458-5-141.
8
Digital disease detection: A systematic review of event-based internet biosurveillance systems.数字疾病检测:基于事件的互联网生物监测系统的系统综述
Int J Med Inform. 2017 May;101:15-22. doi: 10.1016/j.ijmedinf.2017.01.019. Epub 2017 Feb 8.
9
Filtering big data from social media--Building an early warning system for adverse drug reactions.从社交媒体中筛选大数据——构建药物不良反应预警系统。
J Biomed Inform. 2015 Apr;54:230-40. doi: 10.1016/j.jbi.2015.01.011. Epub 2015 Feb 14.
10
Media scanning and verification system as a supplemental tool to disease outbreak detection & reporting at National Centre for Disease Control, Delhi.作为德里国家疾病控制中心疾病爆发检测与报告补充工具的媒体扫描与核实系统。
J Commun Dis. 2012 Mar;44(1):9-14.

引用本文的文献

1
Elaboration of a new framework for fine-grained epidemiological annotation.细粒度流行病学注释新框架的构建。
Sci Data. 2022 Oct 26;9(1):655. doi: 10.1038/s41597-022-01743-2.
2
PADI-web 3.0: A new framework for extracting and disseminating fine-grained information from the news for animal disease surveillance.PADI-web 3.0:一个用于从新闻中提取和传播动物疾病监测细粒度信息的新框架。
One Health. 2021 Dec 3;13:100357. doi: 10.1016/j.onehlt.2021.100357. eCollection 2021 Dec.
3
Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

本文引用的文献

1
Automatic online news monitoring and classification for syndromic surveillance.用于症状监测的自动在线新闻监测与分类
Decis Support Syst. 2009 Nov;47(4):508-517. doi: 10.1016/j.dss.2009.04.016. Epub 2009 May 4.
2
Landscape of international event-based biosurveillance.基于事件的国际生物监测概况。
Emerg Health Threats J. 2010;3:e3. doi: 10.3134/ehtj.10.003. Epub 2010 Feb 19.
3
Event-based biosurveillance of respiratory disease in Mexico, 2007-2009: connection to the 2009 influenza A(H1N1) pandemic?基于事件的墨西哥呼吸道疾病生物监测,2007-2009 年:与 2009 年甲型 H1N1 流感大流行有关?
基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。
Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.
4
Automatic Annotation of Narrative Radiology Reports.叙事性放射学报告的自动标注
Diagnostics (Basel). 2020 Apr 1;10(4):196. doi: 10.3390/diagnostics10040196.
5
Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System.网络监测新发动物传染病,集成于法国动物卫生疫情情报系统。
PLoS One. 2018 Aug 3;13(8):e0199960. doi: 10.1371/journal.pone.0199960. eCollection 2018.
6
The potential use of social media and other internet-related data and communications for child maltreatment surveillance and epidemiological research: Scoping review and recommendations.社交媒体及其他互联网相关数据和通讯在儿童虐待监测和流行病学研究中的潜在应用:范围综述及建议。
Child Abuse Negl. 2018 Nov;85:187-201. doi: 10.1016/j.chiabu.2018.01.014. Epub 2018 Feb 1.
7
Coughing, sneezing, and aching online: Twitter and the volume of influenza-like illness in a pediatric hospital.线上的咳嗽、打喷嚏与疼痛:推特与一家儿科医院的流感样疾病数量
PLoS One. 2017 Jul 28;12(7):e0182008. doi: 10.1371/journal.pone.0182008. eCollection 2017.
8
Discovering Multi-Scale Co-Occurrence Patterns of Asthma and Influenza with Oak Ridge Bio-Surveillance Toolkit.利用橡树岭生物监测工具包发现哮喘和流感的多尺度共现模式。
Front Public Health. 2015 Aug 3;3:182. doi: 10.3389/fpubh.2015.00182. eCollection 2015.
9
A review of evaluations of electronic event-based biosurveillance systems.基于事件的电子生物监测系统评估综述。
PLoS One. 2014 Oct 20;9(10):e111222. doi: 10.1371/journal.pone.0111222. eCollection 2014.
10
International society for disease surveillance conference 2011: building the future of public health surveillance.2011年国际疾病监测大会:构建公共卫生监测的未来
Emerg Health Threats J. 2011 Dec 6;4:11702. doi: 10.3402/ehtj.v4i0.11702.
Euro Surveill. 2010 Jul 29;15(30):19626.
4
Document classification for mining host pathogen protein-protein interactions.挖掘宿主病原体蛋白质-蛋白质相互作用的文档分类。
Artif Intell Med. 2010 Jul;49(3):155-60. doi: 10.1016/j.artmed.2010.04.003. Epub 2010 May 15.
5
Classifying disease outbreak reports using n-grams and semantic features.利用 n 元组和语义特征对疾病爆发报告进行分类。
Int J Med Inform. 2009 Dec;78(12):e47-58. doi: 10.1016/j.ijmedinf.2009.03.010. Epub 2009 May 15.
6
Use of unstructured event-based reports for global infectious disease surveillance.使用基于非结构化事件的报告进行全球传染病监测。
Emerg Infect Dis. 2009 May;15(5):689-95. doi: 10.3201/eid1505.081114.
7
BioCaster: detecting public health rumors with a Web-based text mining system.生物广播器:使用基于网络的文本挖掘系统检测公共卫生谣言。
Bioinformatics. 2008 Dec 15;24(24):2940-1. doi: 10.1093/bioinformatics/btn534. Epub 2008 Oct 15.
8
The surveillance of communicable diseases in the European Union--a long-term strategy (2008-2013).欧盟传染病监测——一项长期战略(2008 - 2013年)
Euro Surveill. 2008 Jun 26;13(26):18912.
9
Surveillance Sans Frontières: Internet-based emerging infectious disease intelligence and the HealthMap project.无国界监控:基于互联网的新发传染病情报与健康地图项目
PLoS Med. 2008 Jul 8;5(7):e151. doi: 10.1371/journal.pmed.0050151.
10
HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports.健康地图:通过对互联网媒体报道进行自动分类和可视化来实现全球传染病监测。
J Am Med Inform Assoc. 2008 Mar-Apr;15(2):150-7. doi: 10.1197/jamia.M2544. Epub 2007 Dec 20.