National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, Japan.
Int J Med Inform. 2010 Apr;79(4):284-96. doi: 10.1016/j.ijmedinf.2010.01.014. Epub 2010 Feb 13.
The emergence and re-emergence of disease outbreaks of international concern in the last several years has raised the importance of health surveillance systems that exploit the open media for their timely and precise detection of events. However, one of the key barriers faced by current event-based health surveillance systems is in identifying fine-grained terms for an outbreak's geographical location. In this article, we present a method to tackle this problem by associating each reported event with the most specific spatial information available in a news report. This would be useful not only for health surveillance systems, but also for other event-centered processing systems.
To develop an automated spatial attribute annotation system, we first created a gold standard corpus for training a machine learning model. Since the qualitative analysis on data suggested that the event class might have an impact on the spatial attribute annotation, we also developed an event classification system to incorporate event class information into the spatial attribute annotation model. To automatically recognize the spatial attribute of events, several approaches, ranging from a simple heuristic technique to a more sophisticated approach based on a state-of-the-art Conditional Random Fields (CRFs) model were explored. Different feature sets were incorporated into the model and compared.
The evaluations were conducted on 100 outbreak news articles. Spatial attribute recognition performance was evaluated based on three metrics; precision, recall and the harmonic mean of precision and recall (F-score). Among three strategies proposed in this article, the CRF model appeared to be the most promising for spatial attribute recognition with a best performance of 85.5% F-score (86.3% precision and 84.7% recall).
We presented a methodology for associating each event in media outbreak reports with their spatial attribute at the finest level of granularity. Our goal has been to provide a means for enhancing the spatial understanding of outbreak-related events. Evaluation studies showed promising results for automatic spatial attribute annotation. In the future, we plan to explore more features, such as semantic correlation between words, that maybe useful for the spatial attribute annotation task.
在过去几年中,国际关注的疾病暴发的出现和再次出现,提高了利用开放媒体及时、准确地检测事件的健康监测系统的重要性。然而,当前基于事件的健康监测系统面临的一个关键障碍是确定暴发地理位置的细粒度术语。在本文中,我们提出了一种通过将每个报告的事件与新闻报道中可用的最具体空间信息相关联来解决此问题的方法。这不仅对健康监测系统有用,对其他以事件为中心的处理系统也有用。
为了开发自动空间属性注释系统,我们首先为机器学习模型创建了一个黄金标准语料库。由于对数据的定性分析表明事件类可能会影响空间属性注释,因此我们还开发了一个事件分类系统,将事件类信息纳入空间属性注释模型。为了自动识别事件的空间属性,我们探索了几种方法,从简单的启发式技术到基于最新条件随机场 (CRFs) 模型的更复杂方法。将不同的特征集纳入模型并进行了比较。
对 100 篇暴发新闻文章进行了评估。基于三个指标评估空间属性识别性能;准确率、召回率和准确率与召回率的调和平均值(F 分数)。在本文提出的三种策略中,CRF 模型似乎最适合用于空间属性识别,最佳性能为 85.5%的 F 分数(86.3%的准确率和 84.7%的召回率)。
我们提出了一种将媒体暴发报告中的每个事件与最细粒度的空间属性相关联的方法。我们的目标是提供一种增强对暴发相关事件的空间理解的方法。评估研究表明自动空间属性注释具有有希望的结果。在未来,我们计划探索更多特征,例如单词之间的语义相关性,这些特征可能对空间属性注释任务有用。