Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, 300 Longwood Ave, Boston, MA 02115, USA.
BMC Bioinformatics. 2009 Nov 24;10:385. doi: 10.1186/1471-2105-10-385.
Automated surveillance of the Internet provides a timely and sensitive method for alerting on global emerging infectious disease threats. HealthMap is part of a new generation of online systems designed to monitor and visualize, on a real-time basis, disease outbreak alerts as reported by online news media and public health sources. HealthMap is of specific interest for national and international public health organizations and international travelers. A particular task that makes such a surveillance useful is the automated discovery of the geographic references contained in the retrieved outbreak alerts. This task is sometimes referred to as "geo-parsing". A typical approach to geo-parsing would demand an expensive training corpus of alerts manually tagged by a human.
Given that human readers perform this kind of task by using both their lexical and contextual knowledge, we developed an approach which relies on a relatively small expert-built gazetteer, thus limiting the need of human input, but focuses on learning the context in which geographic references appear. We show in a set of experiments, that this approach exhibits a substantial capacity to discover geographic locations outside of its initial lexicon.
The results of this analysis provide a framework for future automated global surveillance efforts that reduce manual input and improve timeliness of reporting.
互联网自动监测为全球新发传染病威胁的预警提供了及时而敏感的方法。HealthMap 是新一代在线系统的一部分,旨在实时监测和可视化在线新闻媒体和公共卫生资源报告的疾病暴发警报。HealthMap 对国家和国际公共卫生组织以及国际旅行者具有特殊意义。此类监测的一项特定任务是自动发现检索到的暴发警报中包含的地理参考。此任务有时称为“地理解析”。典型的地理解析方法需要昂贵的人工标记的警报训练语料库。
鉴于人类读者通过使用词汇和上下文知识来执行此类任务,我们开发了一种依赖于相对较小的专家构建的地名词典的方法,从而减少了对人工输入的需求,但侧重于学习地理参考出现的上下文。我们在一系列实验中表明,这种方法具有在其初始词典之外发现地理位置的强大能力。
该分析的结果为未来的自动化全球监测工作提供了一个框架,该框架减少了人工输入并提高了报告的及时性。