Freifeld Clark C, Mandl Kenneth D, Reis Ben Y, Brownstein John S
Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, 300 Longwood Ave., Boston, MA 02115, USA.
J Am Med Inform Assoc. 2008 Mar-Apr;15(2):150-7. doi: 10.1197/jamia.M2544. Epub 2007 Dec 20.
Unstructured electronic information sources, such as news reports, are proving to be valuable inputs for public health surveillance. However, staying abreast of current disease outbreaks requires scouring a continually growing number of disparate news sources and alert services, resulting in information overload. Our objective is to address this challenge through the HealthMap.org Web application, an automated system for querying, filtering, integrating and visualizing unstructured reports on disease outbreaks.
This report describes the design principles, software architecture and implementation of HealthMap and discusses key challenges and future plans.
We describe the process by which HealthMap collects and integrates outbreak data from a variety of sources, including news media (e.g., Google News), expert-curated accounts (e.g., ProMED Mail), and validated official alerts. Through the use of text processing algorithms, the system classifies alerts by location and disease and then overlays them on an interactive geographic map. We measure the accuracy of the classification algorithms based on the level of human curation necessary to correct misclassifications, and examine geographic coverage.
As part of the evaluation of the system, we analyzed 778 reports with HealthMap, representing 87 disease categories and 89 countries. The automated classifier performed with 84% accuracy, demonstrating significant usefulness in managing the large volume of information processed by the system. Accuracy for ProMED alerts is 91% compared to Google News reports at 81%, as ProMED messages follow a more regular structure.
HealthMap is a useful free and open resource employing text-processing algorithms to identify important disease outbreak information through a user-friendly interface.
事实证明,诸如新闻报道等非结构化电子信息源是公共卫生监测的宝贵输入。然而,要跟上当前疾病爆发的情况,需要浏览越来越多不同的新闻源和警报服务,这导致了信息过载。我们的目标是通过HealthMap.org网络应用程序应对这一挑战,该应用程序是一个用于查询、筛选、整合和可视化疾病爆发非结构化报告的自动化系统。
本报告描述了HealthMap的设计原则、软件架构和实现,并讨论了关键挑战和未来计划。
我们描述了HealthMap从各种来源收集和整合疫情数据的过程,这些来源包括新闻媒体(如谷歌新闻)、专家策划的账户(如ProMED Mail)以及经过验证的官方警报。通过使用文本处理算法,系统按地点和疾病对警报进行分类,然后将它们叠加在交互式地理地图上。我们根据纠正错误分类所需的人工审核水平来衡量分类算法的准确性,并检查地理覆盖范围。
作为系统评估的一部分,我们使用HealthMap分析了778份报告,这些报告涵盖87种疾病类别和89个国家。自动分类器的准确率为84%,这表明在管理系统处理的大量信息方面具有显著作用。ProMED警报的准确率为91%,而谷歌新闻报道的准确率为81%,因为ProMED消息的结构更规范。
HealthMap是一个有用的免费开放资源,它采用文本处理算法,通过用户友好的界面识别重要的疾病爆发信息。