Suppr超能文献

利用自然语言处理、机器学习和人类专业知识开发全球传染病活动数据库。

Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.

机构信息

Harvard University, School of Engineering and Applied Sciences, Cambridge, Massachusetts, USA.

Li Ka Shing Knowledge Institute, St. Michaels Hospital, Toronto, Ontario, Canada.

出版信息

J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.

Abstract

OBJECTIVE

We assessed whether machine learning can be utilized to allow efficient extraction of infectious disease activity information from online media reports.

MATERIALS AND METHODS

We curated a data set of labeled media reports (n = 8322) indicating which articles contain updates about disease activity. We trained a classifier on this data set. To validate our system, we used a held out test set and compared our articles to the World Health Organization Disease Outbreak News reports.

RESULTS

Our classifier achieved a recall and precision of 88.8% and 86.1%, respectively. The overall surveillance system detected 94% of the outbreaks identified by the WHO covered by online media (89%) and did so 43.4 (IQR: 9.5-61) days earlier on average.

DISCUSSION

We constructed a global real-time disease activity database surveilling 114 illnesses and syndromes. We must further assess our system for bias, representativeness, granularity, and accuracy.

CONCLUSION

Machine learning, natural language processing, and human expertise can be used to efficiently identify disease activity from digital media reports.

摘要

目的

评估机器学习是否可用于从在线媒体报道中高效提取传染病活动信息。

材料与方法

我们整理了一个标记媒体报道数据集(n=8322),指示哪些文章包含有关疾病活动的更新。我们在该数据集上训练了一个分类器。为了验证我们的系统,我们使用了一个保留的测试集,并将我们的文章与世界卫生组织疾病暴发新闻报道进行了比较。

结果

我们的分类器的召回率和准确率分别为 88.8%和 86.1%。总体监测系统检测到了在线媒体报道的 94%的世界卫生组织所涵盖的暴发(89%),平均提前了 43.4(IQR:9.5-61)天。

讨论

我们构建了一个全球性的实时疾病活动数据库,监测 114 种疾病和综合征。我们必须进一步评估我们的系统的偏差、代表性、粒度和准确性。

结论

机器学习、自然语言处理和人类专业知识可用于从数字媒体报道中高效识别疾病活动。

相似文献

9
Sharing Data for Global Infectious Disease Surveillance and Outbreak Detection.全球传染病监测和疫情检测数据共享。
Trends Microbiol. 2016 Apr;24(4):241-245. doi: 10.1016/j.tim.2016.01.009. Epub 2016 Feb 12.

引用本文的文献

本文引用的文献

1
Big Data for Infectious Disease Surveillance and Modeling.用于传染病监测与建模的大数据
J Infect Dis. 2016 Dec 1;214(suppl_4):S375-S379. doi: 10.1093/infdis/jiw400.
7
An overview of internet biosurveillance.互联网生物监测概述。
Clin Microbiol Infect. 2013 Nov;19(11):1006-13. doi: 10.1111/1469-0691.12273. Epub 2013 Jun 21.
8
Big data opportunities for global infectious disease surveillance.大数据在全球传染病监测中的应用机遇。
PLoS Med. 2013;10(4):e1001413. doi: 10.1371/journal.pmed.1001413. Epub 2013 Apr 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验