用于药物警戒的文本中药物效应的标注与检测。

Annotation and detection of drug effects in text for pharmacovigilance.

作者信息

Thompson Paul, Daikou Sophia, Ueno Kenju, Batista-Navarro Riza, Tsujii Jun'ichi, Ananiadou Sophia

机构信息

National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.

Artificial Intelligence Research Center, National Research and Development Agency (AIST), Tokyo Waterfront 2-3-2 Aomi, Koto-ku, Tokyo, 135-0064, Japan.

出版信息

J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.

DOI:10.1186/s13321-018-0290-y

PMID:30105604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6089860/

Abstract

Pharmacovigilance (PV) databases record the benefits and risks of different drugs, as a means to ensure their safe and effective use. Creating and maintaining such resources can be complex, since a particular medication may have divergent effects in different individuals, due to specific patient characteristics and/or interactions with other drugs being administered. Textual information from various sources can provide important evidence to curators of PV databases about the usage and effects of drug targets in different medical subjects. However, the efficient identification of relevant evidence can be challenging, due to the increasing volume of textual data. Text mining (TM) techniques can support curators by automatically detecting complex information, such as interactions between drugs, diseases and adverse effects. This semantic information supports the quick identification of documents containing information of interest (e.g., the different types of patients in which a given adverse drug reaction has been observed to occur). TM tools are typically adapted to different domains by applying machine learning methods to corpora that are manually labelled by domain experts using annotation guidelines to ensure consistency. We present a semantically annotated corpus of 597 MEDLINE abstracts, PHAEDRA, encoding rich information on drug effects and their interactions, whose quality is assured through the use of detailed annotation guidelines and the demonstration of high levels of inter-annotator agreement (e.g., 92.6% F-Score for identifying named entities and 78.4% F-Score for identifying complex events, when relaxed matching criteria are applied). To our knowledge, the corpus is unique in the domain of PV, according to the level of detail of its annotations. To illustrate the utility of the corpus, we have trained TM tools based on its rich labels to recognise drug effects in text automatically. The corpus and annotation guidelines are available at: http://www.nactem.ac.uk/PHAEDRA/ .

摘要

药物警戒（PV）数据库记录不同药物的益处和风险，作为确保其安全有效使用的一种手段。创建和维护此类资源可能很复杂，因为特定药物在不同个体中可能有不同的效果，这是由于特定的患者特征和/或与正在使用的其他药物的相互作用所致。来自各种来源的文本信息可以为PV数据库的管理者提供关于药物靶点在不同医学主题中的使用和效果的重要证据。然而，由于文本数据量的不断增加，有效识别相关证据可能具有挑战性。文本挖掘（TM）技术可以通过自动检测复杂信息（如药物、疾病和不良反应之间的相互作用）来支持管理者。这种语义信息有助于快速识别包含感兴趣信息的文档（例如，观察到特定药物不良反应发生的不同类型患者）。TM工具通常通过将机器学习方法应用于由领域专家使用注释指南进行手动标注以确保一致性的语料库来适应不同领域。我们展示了一个由597篇MEDLINE摘要组成的语义标注语料库PHAEDRA，它编码了关于药物效果及其相互作用的丰富信息，通过使用详细的注释指南和展示高水平的注释者间一致性（例如，应用宽松匹配标准时，识别命名实体的F值为92.6%，识别复杂事件的F值为78.4%）来确保其质量。据我们所知，根据其注释的详细程度，该语料库在PV领域是独一无二的。为了说明该语料库的实用性，我们基于其丰富的标签训练了TM工具，以自动识别文本中的药物效果。该语料库和注释指南可在以下网址获取：http://www.nactem.ac.uk/PHAEDRA/ 。

相似文献

Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。

J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Enriching a biomedical event corpus with meta-knowledge annotation.用元知识标注丰富生物医学事件语料库。

BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.

NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库：一种用于疾病名称识别和概念规范化的资源。

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.关于创建西班牙语临床金标准语料库：挖掘药物不良反应

J Biomed Inform. 2015 Aug;56:318-32. doi: 10.1016/j.jbi.2015.06.016. Epub 2015 Jun 30.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库：化学疾病关系提取的资源。

Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.

Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类

J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.

TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.TwiMed：Twitter与PubMed关于药物、疾病、症状及其关系的可比语料库。

JMIR Public Health Surveill. 2017 May 3;3(2):e24. doi: 10.2196/publichealth.6396.

BC4GO: a full-text corpus for the BioCreative IV GO task.BC4GO：用于生物创意IV基因本体任务的全文语料库。

Database (Oxford). 2014 Jul 28;2014. doi: 10.1093/database/bau074. Print 2014.

引用本文的文献

PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports.PretoxTM：一种用于从临床前毒理学报告中提取治疗相关发现的文本挖掘系统。

J Cheminform. 2025 Feb 3;17(1):15. doi: 10.1186/s13321-024-00925-x.

Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.用于西班牙语医学文本语义标注的混合自然语言处理工具。

BMC Bioinformatics. 2025 Jan 8;26(1):7. doi: 10.1186/s12859-024-05949-6.

A novel corpus of molecular to higher-order events that facilitates the understanding of the pathogenic mechanisms of idiopathic pulmonary fibrosis.一种新的分子到更高阶事件的语料库，有助于理解特发性肺纤维化的发病机制。

Sci Rep. 2023 Apr 12;13(1):5986. doi: 10.1038/s41598-023-32915-8.

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

The Use of Artificial Intelligence in Pharmacovigilance: A Systematic Review of the Literature.人工智能在药物警戒中的应用：文献系统评价。

Pharmaceut Med. 2022 Oct;36(5):295-306. doi: 10.1007/s40290-022-00441-z. Epub 2022 Jul 29.

Herb-Drug Interactions: Worlds Intersect with the Patient at the Center.草药-药物相互作用：以患者为中心，不同领域相互交织。

Medicines (Basel). 2021 Aug 5;8(8):44. doi: 10.3390/medicines8080044.

NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.NLM-Gene，一个丰富注释的基因实体黄金标准数据集，解决了模糊性和多物种基因识别问题。

J Biomed Inform. 2021 Jun;118:103779. doi: 10.1016/j.jbi.2021.103779. Epub 2021 Apr 9.

Building a semantically annotated corpus for chronic disease complications using two document types.使用两种文档类型构建语义标注的慢性病并发症语料库。

PLoS One. 2021 Mar 18;16(3):e0247319. doi: 10.1371/journal.pone.0247319. eCollection 2021.

The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study.预训练语言模型对跨语言医学文本中否定和推测检测的影响：比较研究

JMIR Med Inform. 2020 Dec 3;8(12):e18953. doi: 10.2196/18953.

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.标注与检测慢性阻塞性肺疾病的表型信息。

JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.

本文引用的文献

An annotated corpus with nanomedicine and pharmacokinetic parameters.一个带有纳米医学和药代动力学参数的注释语料库。

Int J Nanomedicine. 2017 Oct 12;12:7519-7527. doi: 10.2147/IJN.S137117. eCollection 2017.

An attention-based effective neural model for drug-drug interactions extraction.一种基于注意力机制的有效神经模型用于药物-药物相互作用提取。

BMC Bioinformatics. 2017 Oct 10;18(1):445. doi: 10.1186/s12859-017-1855-x.

SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data.SciLite：一个用于显示文本挖掘注释的平台，作为将研究文章与生物数据相链接的一种手段。

Wellcome Open Res. 2017 Jul 10;1:25. doi: 10.12688/wellcomeopenres.10210.2. eCollection 2016.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文（CRAFT）语料库中的共指标注与消解

BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.

LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.LimTox：一个用于化合物、药物和基因的不良事件和毒性关联的应用文本挖掘的网络工具。

Nucleic Acids Res. 2017 Jul 3;45(W1):W484-W489. doi: 10.1093/nar/gkx462.

Evaluation of resources for analyzing drug interactions.分析药物相互作用的资源评估。

J Med Libr Assoc. 2016 Oct;104(4):290-295. doi: 10.3163/1536-5050.104.4.007.

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.将异构文本源中的表型信息映射到特定领域的术语资源。

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.生物创意V生物C轨迹概述：生物网格的协作生物编目员助手任务。

Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.

DrugQuest - a text mining workflow for drug association discovery.DrugQuest——一种用于药物关联发现的文本挖掘工作流程。

BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):182. doi: 10.1186/s12859-016-1041-6.

AuDis: an automatic CRF-enhanced disease normalization in biomedical text.AuDis：生物医学文本中一种自动的基于条件随机场增强的疾病规范化方法

Database (Oxford). 2016 Jun 7;2016. doi: 10.1093/database/baw091. Print 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于药物警戒的文本中药物效应的标注与检测。

Annotation and detection of drug effects in text for pharmacovigilance.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献