Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.
J Biomed Inform. 2012 Oct;45(5):879-84. doi: 10.1016/j.jbi.2012.04.004. Epub 2012 Apr 25.
Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.
具有特定实体和关系注释的语料库对于训练和评估从大型语料库中提取特定结构化信息的文本挖掘系统至关重要。在本文中,我们描述了一种方法,其中命名实体识别系统生成第一个注释,注释者使用基于 Web 的界面来修改此注释。所达到的一致性数据表明,注释者之间的一致性远高于与系统提供的注释的一致性。该语料库已针对药物、疾病、基因及其相互关系进行了注释。对于药物-疾病、药物-靶标和靶标-疾病关系中的每一种,三位专家都对一组 100 篇摘要进行了注释。这些已注释的关系将用于培训和评估文本挖掘软件,以捕获文本中的这些关系。