Malik Rainer, Franke Lude, Siebes Arno
Universiteit Utrecht, Department of Information and Computing Sciences, Padualaan 14, 3584CH Utrecht, The Netherlands.
Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.
Recently, several information extraction systems have been developed to retrieve relevant information out of biomedical text. However, these methods represent individual efforts. In this paper, we show that by combining different algorithms and their outcome, the results improve significantly. For this reason, CONAN has been created, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms.
In this paper, we will present data that show that combining different text-mining algorithms significantly improves the results. Not only is CONAN a full-scale approach that will ultimately cover all of PubMed/MEDLINE, we also show that this universality has no effect on quality: our system performs as well as or better than existing systems.
The LDD corpus presented is available by request to the author. The system will be available shortly. For information and updates on CONAN please visit http://www.cs.uu.nl/people/rainer/conan.html.
最近,已经开发了几种信息提取系统,用于从生物医学文本中检索相关信息。然而,这些方法都是各自独立开展的工作。在本文中,我们表明,通过结合不同的算法及其结果,结果会有显著改善。出于这个原因,我们创建了CONAN系统,该系统结合了不同的程序及其结果。其方法包括对基因/蛋白质名称进行标注、查找相互作用和突变数据、对生物学概念进行标注以及与医学主题词表(MeSH)和基因本体论(Gene Ontology)术语进行链接。
在本文中,我们将展示数据,表明结合不同的文本挖掘算法能显著改善结果。CONAN不仅是一种全面的方法,最终将涵盖所有的PubMed/MEDLINE,我们还表明这种通用性对质量没有影响:我们的系统表现与现有系统相当或更优。
所呈现的LDD语料库可应作者要求提供。该系统将很快可用。有关CONAN的信息和更新,请访问http://www.cs.uu.nl/people/rainer/conan.html。