Suppr超能文献

PADI网络语料库:动物健康领域的带标签文本数据。

PADI-web corpus: Labeled textual data in animal health domain.

作者信息

Rabatel Julien, Arsevska Elena, Roche Mathieu

机构信息

Cirad, Montpellier, France.

ASTRE, Cirad, INRA, Montpellier, France.

出版信息

Data Brief. 2018 Dec 23;22:643-646. doi: 10.1016/j.dib.2018.12.063. eCollection 2019 Feb.

Abstract

Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper.

摘要

监测全球动物健康,尤其是早期发现新出现病原体的疫情,是防止传染病传入各国的手段之一(Collier等人,2008年)[3]。在此背景下,我们开发了PADI-web,即一个从网络自动提取动物疾病信息的平台(Arsevska等人,2016年、2018年)。PADI-web是一种文本挖掘工具,可自动从网络新闻文章中检测、分类并提取疾病爆发信息。PADI-web目前在网络上监测五种新出现的动物传染病,即非洲猪瘟、禽流感(包括高致病性和低致病性禽流感)、口蹄疫、蓝舌病和施马伦贝格病毒感染。PADI-web通过RSS订阅源近乎实时地收集网络新闻文章。目前,由于谷歌新闻具有国际和多语言覆盖范围,PADI-web从谷歌新闻收集疾病信息。我们实施了机器学习技术来识别文本中的相关疾病信息(即疫情的地点和日期、受影响的宿主、其数量和临床症状)。为了训练从新闻文章中提取信息(IE)的模型,领域专家已手动标注了一个英文语料库。本数据论文展示了这个标注语料库(Rabatel等人,2017年)。

相似文献

1
PADI-web corpus: Labeled textual data in animal health domain.PADI网络语料库:动物健康领域的带标签文本数据。
Data Brief. 2018 Dec 23;22:643-646. doi: 10.1016/j.dib.2018.12.063. eCollection 2019 Feb.
8
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.使用文本挖掘技术从PhenoCHF语料库中提取表型信息。
BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验