• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于疫情检测的多语言事件提取

Multilingual event extraction for epidemic detection.

作者信息

Lejeune Gaël, Brixtel Romain, Doucet Antoine, Lucas Nadine

机构信息

Groupe de Recherche en Informatique, Image et Instrumentation, University of Caen Lower-Normandy, boulevard Maréchal Juin, 14032 Caen, France; Laboratoire d'Informatique de Nantes Atlantique, University of Nantes, 2 rue de la Houssinière, 44322 Nantes, France.

Groupe de Recherche en Informatique, Image et Instrumentation, University of Caen Lower-Normandy, boulevard Maréchal Juin, 14032 Caen, France; Department of Organizational Behavior, Faculty of Business and Economics, Quartier Dorigny, University of Lausanne, 1015 Lausanne, Switzerland.

出版信息

Artif Intell Med. 2015 Oct;65(2):131-43. doi: 10.1016/j.artmed.2015.06.005. Epub 2015 Jul 17.

DOI:10.1016/j.artmed.2015.06.005
PMID:26228941
Abstract

OBJECTIVE

This paper presents a multilingual news surveillance system applied to tele-epidemiology. It has been shown that multilingual approaches improve timeliness in detection of epidemic events across the globe, eliminating the wait for local news to be translated into major languages. We present here a system to extract epidemic events in potentially any language, provided a Wikipedia seed for common disease names exists.

METHODS

The Daniel system presented herein relies on properties that are common to news writing (the journalistic genre), the most useful being repetition and saliency. Wikipedia is used to screen common disease names to be matched with repeated characters strings. Language variations, such as declensions, are handled by processing text at the character-level, rather than at the word level. This additionally makes it possible to handle various writing systems in a similar fashion.

MATERIAL

As no multilingual ground truth existed to evaluate the Daniel system, we built a multilingual corpus from the Web, and collected annotations from native speakers of Chinese, English, Greek, Polish and Russian, with no connection or interest in the Daniel system. This data set is available online freely, and can be used for the evaluation of other event extraction systems.

RESULTS

Experiments for 5 languages out of 17 tested are detailed in this paper: Chinese, English, Greek, Polish and Russian. The Daniel system achieves an average F-measure of 82% in these 5 languages. It reaches 87% on BEcorpus, the state-of-the-art corpus in English, slightly below top-performing systems, which are tailored with numerous language-specific resources. The consistent performance of Daniel on multiple languages is an important contribution to the reactivity and the coverage of epidemiological event detection systems.

CONCLUSIONS

Most event extraction systems rely on extensive resources that are language-specific. While their sophistication induces excellent results (over 90% precision and recall), it restricts their coverage in terms of languages and geographic areas. In contrast, in order to detect epidemic events in any language, the Daniel system only requires a list of a few hundreds of disease names and locations, which can actually be acquired automatically. The system can perform consistently well on any language, with precision and recall around 82% on average, according to this paper's evaluation. Daniel's character-based approach is especially interesting for morphologically-rich and low-resourced languages. The lack of resources to be exploited and the state of the art string matching algorithms imply that Daniel can process thousands of documents per minute on a simple laptop. In the context of epidemic surveillance, reactivity and geographic coverage are of primary importance, since no one knows where the next event will strike, and therefore in what vernacular language it will first be reported. By being able to process any language, the Daniel system offers unique coverage for poorly endowed languages, and can complete state of the art techniques for major languages.

摘要

目的

本文介绍了一种应用于远程流行病学的多语言新闻监测系统。研究表明,多语言方法可提高全球范围内疫情事件检测的及时性,无需等待当地新闻被翻译成主要语言。我们在此展示一种系统,只要存在常见疾病名称的维基百科种子,就能提取任何语言中的疫情事件。

方法

本文介绍的丹尼尔系统依赖于新闻写作(新闻体裁)共有的属性,其中最有用的是重复性和显著性。维基百科用于筛选要与重复字符串匹配的常见疾病名称。诸如词形变化等语言变体通过在字符级别而非单词级别处理文本进行处理。这还使得能够以类似方式处理各种书写系统。

材料

由于不存在用于评估丹尼尔系统的多语言地面真值,我们从网络构建了一个多语言语料库,并从中文、英文、希腊文、波兰文和俄文的母语使用者那里收集注释,他们与丹尼尔系统没有关联或利益关系。该数据集可在线免费获取,可用于评估其他事件提取系统。

结果

本文详细介绍了对17种测试语言中的5种语言进行的实验:中文、英文、希腊文、波兰文和俄文。丹尼尔系统在这5种语言中平均F值达到82%。在英文的最先进语料库BEcorpus上达到87%,略低于使用大量特定语言资源定制的表现最佳的系统。丹尼尔在多种语言上的一致表现对流行病学事件检测系统的反应性和覆盖范围做出了重要贡献。

结论

大多数事件提取系统依赖于大量特定语言的资源。虽然它们的复杂性带来了出色的结果(精确率和召回率超过90%),但在语言和地理区域覆盖方面受到限制。相比之下,为了检测任何语言中的疫情事件,丹尼尔系统只需要几百个疾病名称和地点的列表,实际上这些可以自动获取。根据本文的评估,该系统在任何语言上都能始终如一地表现良好,平均精确率和召回率约为82%。丹尼尔基于字符的方法对于形态丰富和资源匮乏的语言尤其有趣。由于缺乏可利用的资源以及现有的字符串匹配算法,丹尼尔可以在一台普通笔记本电脑上每分钟处理数千份文档。在疫情监测的背景下,反应性和地理覆盖至关重要,因为没有人知道下一个事件将在哪里发生,以及因此它将首先以何种方言被报道。通过能够处理任何语言,丹尼尔系统为资源匮乏的语言提供了独特的覆盖范围,并且可以完善主要语言的现有技术。

相似文献

1
Multilingual event extraction for epidemic detection.用于疫情检测的多语言事件提取
Artif Intell Med. 2015 Oct;65(2):131-43. doi: 10.1016/j.artmed.2015.06.005. Epub 2015 Jul 17.
2
Multilingual chief complaint classification for syndromic surveillance: an experiment with Chinese chief complaints.用于症状监测的多语言主诉分类:对中文主诉的一项实验
Int J Med Inform. 2009 May;78(5):308-20. doi: 10.1016/j.ijmedinf.2008.08.004. Epub 2008 Oct 5.
3
Resourcing speech-language pathologists to work with multilingual children.为言语语言病理学家提供资源,以便与多语言儿童合作。
Int J Speech Lang Pathol. 2014 Jun;16(3):208-18. doi: 10.3109/17549507.2013.876666.
4
Leveraging Wikipedia knowledge to classify multilingual biomedical documents.利用维基百科知识对多语言生物医学文献进行分类。
Artif Intell Med. 2018 Jun;88:37-57. doi: 10.1016/j.artmed.2018.04.007. Epub 2018 May 3.
5
Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources.俄语中基于知识的事件抽取:基于语料库的语言资源
Comput Intell Neurosci. 2016;2016:4183760. doi: 10.1155/2016/4183760. Epub 2016 Jan 5.
6
Multilingual deep learning framework for fake news detection using capsule neural network.使用胶囊神经网络的多语言假新闻检测深度学习框架。
J Intell Inf Syst. 2023 May 9:1-17. doi: 10.1007/s10844-023-00788-y.
7
Understanding Editing Behaviors in Multilingual Wikipedia.理解多语言维基百科中的编辑行为
PLoS One. 2016 May 12;11(5):e0155305. doi: 10.1371/journal.pone.0155305. eCollection 2016.
8
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval.多语言医学术语的自动处理:在叙词表扩充和跨语言信息检索中的应用
Artif Intell Med. 2005 Feb;33(2):111-24. doi: 10.1016/j.artmed.2004.07.015.
9
A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.用于访问PubMed单语子集的搜索引擎:概念验证及法语评估
J Med Internet Res. 2014 Dec 1;16(12):e271. doi: 10.2196/jmir.3836.
10
A methodology to enhance spatial understanding of disease outbreak events reported in news articles.一种增强对新闻文章中报道的疾病暴发事件的空间理解的方法。
Int J Med Inform. 2010 Apr;79(4):284-96. doi: 10.1016/j.ijmedinf.2010.01.014. Epub 2010 Feb 13.

引用本文的文献

1
Elaboration of a new framework for fine-grained epidemiological annotation.细粒度流行病学注释新框架的构建。
Sci Data. 2022 Oct 26;9(1):655. doi: 10.1038/s41597-022-01743-2.
2
Identifying Sleep-Deprived Authors of Tweets: Prospective Study.识别发布推文的睡眠不足作者:前瞻性研究。
JMIR Ment Health. 2019 Dec 6;6(12):e13076. doi: 10.2196/13076.
3
From global action against malaria to local issues: state of the art and perspectives of web platforms dealing with malaria information.从全球对抗疟疾行动到地方问题:处理疟疾信息的网络平台的最新技术和展望。
Malar J. 2018 Mar 21;17(1):122. doi: 10.1186/s12936-018-2270-0.
4
Some Innovative Approaches for Public Health and Epidemiology Informatics.公共卫生与流行病学信息学的一些创新方法。
Yearb Med Inform. 2016 Nov 10(1):247-250. doi: 10.15265/IY-2016-047.