Zhang Yulei, Dang Yan, Chen Hsinchun, Thurmond Mark, Larson Cathy
Artificial Intelligence Lab, Department of Management Information Systems, Eller College of Management, University of Arizona, Tucson, AZ 85721, USA.
FMD Lab, Center for Animal Disease Modeling and Surveillance (CADMS), University of California, Davis, CA 95616, USA.
Decis Support Syst. 2009 Nov;47(4):508-517. doi: 10.1016/j.dss.2009.04.016. Epub 2009 May 4.
Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments, we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases, and Named Entities features outperformed the Bag of Words feature subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset.
症状监测在保护公众健康免受传染病侵害方面可以发挥重要作用。传染病暴发会对社会和经济造成毁灭性影响,因此全球意识对于防范重大疫情至关重要。通过监测在线新闻来源并开发用于症状监测的准确新闻分类系统,公共卫生人员可以了解疫情和潜在的疫情情况。在本研究中,我们开发了一个用于症状监测的自动在线新闻监测和分类框架。该框架独具特色,本研究中采用的技术此前均未用于传染病症状监测的背景下。在最近的分类实验中,我们比较了不同特征子集在不同机器学习算法上的性能。结果表明,包括词袋、名词短语和命名实体特征在内的组合特征子集优于词袋特征子集。此外,特征选择提高了在线新闻分类中特征子集的性能。在所选的组合特征子集上使用支持向量机时,实现了最高的分类性能。