基于筛法的共指消解增强了用于化学诱导疾病关系提取的半监督学习模型。

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.

作者信息

Le Hoang-Quynh, Tran Mai-Vu, Dang Thanh Hai, Ha Quang-Thuy, Collier Nigel

机构信息

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam. Building E3, 144 Xuan Thuy str., Cau Giay dist., Hanoi, Vietnam. Postal code: 100000.

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam. Building E3, 144 Xuan Thuy str., Cau Giay dist., Hanoi, Vietnam. Postal code: 100000

出版信息

Database (Oxford). 2016 Jul;2016. doi: 10.1093/database/baw102.

DOI:10.1093/database/baw102

PMID:27630201

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4962668/

Abstract

The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a 'silver' CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%).Database URL: SilverCID-The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530).

摘要

生物创意V化学-疾病关系（CDR）赛道旨在加快文本挖掘在促进对化学物质、疾病及其关系的综合理解方面的进展。在本文中，我们描述了参与生物创意V CDR的系统（即UET-CAM）的扩展。生物创意CDR赛道委员会将原始UET-CAM系统的性能在18个参与系统中排名第四。在疾病命名实体识别与规范化（DNER）阶段，我们的系统采用基于感知器的命名实体识别器（NER）进行联合推理（解码），并使用具有语义监督索引和跳字模型的回退模型进行命名实体规范化。在化学诱导疾病（CID）关系提取阶段，我们提出了一个包含共指消解模块和支持向量机关系提取模型的管道。前一个模块利用多遍筛法来扩展实体召回率。在本文中，通过添加一个“银”CID语料库来训练预测模型，对UET-CAM系统进行了改进。这个超过5万句的银标准语料库是基于比较毒理基因组学数据库（CTD）自动构建的。我们在CDR测试集上评估了我们的方法。结果表明，我们的系统在DNER任务中F1值达到82.44，在CID任务中F1值达到58.90，达到了当前的最优性能。分析表明，多遍筛法共指消解方法（F1提高4.13%）和银CID语料库（F1提高7.3%）都有显著益处。数据库网址：SilverCID - 用于CID关系提取的银标准语料库可在以下网址免费在线获取：https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530) 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26ff/4962668/af025f7a94bd/baw102f1p.jpg

相似文献

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.基于筛法的共指消解增强了用于化学诱导疾病关系提取的半监督学习模型。

Database (Oxford). 2016 Jul;2016. doi: 10.1093/database/baw102.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状：生物创意V化学-疾病关系（CDR）任务概述。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库：化学疾病关系提取的资源。

Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.

HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V.哈尔滨工业大学深圳校区的化学与疾病关系抽取系统（HITSZ_CDR）：用于生物创意竞赛V的端到端系统

Database (Oxford). 2016 Jun 5;2016. doi: 10.1093/database/baw077. Print 2016.

Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks.结合条件随机场和双向递归神经网络的疾病命名实体识别

Database (Oxford). 2016 Oct 24;2016. doi: 10.1093/database/baw140. Print 2016.

relSCAN - A system for extracting chemical-induced disease relation from biomedical literature.relSCAN-从生物医学文献中提取化学诱导疾病关系的系统。

J Biomed Inform. 2018 Nov;87:79-87. doi: 10.1016/j.jbi.2018.09.018. Epub 2018 Oct 6.

Extraction of chemical-induced diseases using prior knowledge and textual information.利用先验知识和文本信息提取化学诱导疾病

Database (Oxford). 2016 Apr 14;2016. doi: 10.1093/database/baw046. Print 2016.

Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.结合机器学习、众包和专家知识来检测文本中的化学诱导疾病。

Database (Oxford). 2016 Jun 15;2016. doi: 10.1093/database/baw094. Print 2016.

CD-REST: a system for extracting chemical-induced disease relation in literature.CD-REST：一种用于从文献中提取化学物质诱发疾病关系的系统。

Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw036. Print 2016.

A knowledge-poor approach to chemical-disease relation extraction.一种用于化学-疾病关系提取的知识匮乏方法。

Database (Oxford). 2016 May 17;2016. doi: 10.1093/database/baw071. Print 2016.

引用本文的文献

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

Exploiting document graphs for inter sentence relation extraction.利用文档图进行句子间关系抽取。

J Biomed Semantics. 2022 Jun 3;13(1):15. doi: 10.1186/s13326-022-00267-3.

Unsupervised inference of implicit biomedical events using context triggers.使用上下文触发器进行无监督的隐含生物医学事件推断。

BMC Bioinformatics. 2020 Jan 28;21(1):29. doi: 10.1186/s12859-020-3341-0.

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning.一种使用预训练模型和微调改进基于深度学习的医学关系抽取的通用方法。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz116.

Discovering Links Between Side Effects and Drugs Using a Diffusion Based Method.利用基于扩散的方法发现药物副作用之间的关联。

Sci Rep. 2019 Jul 18;9(1):10436. doi: 10.1038/s41598-019-46939-6.

DigChem: Identification of disease-gene-chemical relationships from Medline abstracts.DigChem：从 Medline 文摘中识别疾病-基因-化学关系。

PLoS Comput Biol. 2019 May 15;15(5):e1007022. doi: 10.1371/journal.pcbi.1007022. eCollection 2019 May.

本文引用的文献

Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach.使用基于丰富特征的线性核方法从文献中提取药物相互作用。

J Biomed Inform. 2015 Jun;55:23-30. doi: 10.1016/j.jbi.2015.03.002. Epub 2015 Mar 19.

Knowledge-based extraction of adverse drug events from biomedical text.基于知识的生物医学文本中不良药物事件的提取。

BMC Bioinformatics. 2014 Mar 4;15:64. doi: 10.1186/1471-2105-15-64.

A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.CTD-Pfizer 合作项目：对 88000 篇经文本挖掘的科学文章进行人工注释，以发现药物-疾病和药物-表型相互作用。

Database (Oxford). 2013 Nov 28;2013:bat080. doi: 10.1093/database/bat080. Print 2013.

DNorm: disease name normalization with pairwise learning to rank.DNorm：基于对分学习排序的疾病名称标准化。

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

Development and evaluation of an ensemble resource linking medications to their indications.开发并评估一个药物与适应证关联的集成资源。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):954-61. doi: 10.1136/amiajnl-2012-001431. Epub 2013 Apr 10.

Coreference resolution: a review of general methodologies and applications in the clinical domain.共指消解：综述临床领域的通用方法及应用。

J Biomed Inform. 2011 Dec;44(6):1113-22. doi: 10.1016/j.jbi.2011.08.006. Epub 2011 Aug 12.

Prediction of adverse drug reactions using decision tree modeling.利用决策树建模预测药物不良反应。

Clin Pharmacol Ther. 2010 Jul;88(1):52-9. doi: 10.1038/clpt.2009.248. Epub 2010 Mar 10.

Event extraction with complex event classification using rich features.利用丰富特征进行复杂事件分类的事件抽取。

J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.

Understanding PubMed user search behavior through log analysis.通过日志分析了解PubMed用户的搜索行为。

Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27.

A side effect resource to capture phenotypic effects of drugs.一个用于捕捉药物表型效应的副作用资源。

Mol Syst Biol. 2010;6:343. doi: 10.1038/msb.2009.98. Epub 2010 Jan 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于筛法的共指消解增强了用于化学诱导疾病关系提取的半监督学习模型。

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献