文本挖掘算法的组合提高了性能。

Combination of text-mining algorithms increases the performance.

作者信息

Malik Rainer, Franke Lude, Siebes Arno

机构信息

Universiteit Utrecht, Department of Information and Computing Sciences, Padualaan 14, 3584CH Utrecht, The Netherlands.

出版信息

Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.

DOI:10.1093/bioinformatics/btl281

PMID:16766558

Abstract

MOTIVATION

Recently, several information extraction systems have been developed to retrieve relevant information out of biomedical text. However, these methods represent individual efforts. In this paper, we show that by combining different algorithms and their outcome, the results improve significantly. For this reason, CONAN has been created, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms.

RESULTS

In this paper, we will present data that show that combining different text-mining algorithms significantly improves the results. Not only is CONAN a full-scale approach that will ultimately cover all of PubMed/MEDLINE, we also show that this universality has no effect on quality: our system performs as well as or better than existing systems.

AVAILABILITY

The LDD corpus presented is available by request to the author. The system will be available shortly. For information and updates on CONAN please visit http://www.cs.uu.nl/people/rainer/conan.html.

摘要

动机

最近，已经开发了几种信息提取系统，用于从生物医学文本中检索相关信息。然而，这些方法都是各自独立开展的工作。在本文中，我们表明，通过结合不同的算法及其结果，结果会有显著改善。出于这个原因，我们创建了CONAN系统，该系统结合了不同的程序及其结果。其方法包括对基因/蛋白质名称进行标注、查找相互作用和突变数据、对生物学概念进行标注以及与医学主题词表（MeSH）和基因本体论（Gene Ontology）术语进行链接。

结果

在本文中，我们将展示数据，表明结合不同的文本挖掘算法能显著改善结果。CONAN不仅是一种全面的方法，最终将涵盖所有的PubMed/MEDLINE，我们还表明这种通用性对质量没有影响：我们的系统表现与现有系统相当或更优。

可用性

所呈现的LDD语料库可应作者要求提供。该系统将很快可用。有关CONAN的信息和更新，请访问http://www.cs.uu.nl/people/rainer/conan.html。

相似文献

Combination of text-mining algorithms increases the performance.文本挖掘算法的组合提高了性能。

Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.

Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach.通过结合文献挖掘和微阵列分析构建生物网络：一种LMMA方法。

Bioinformatics. 2006 Sep 1;22(17):2143-50. doi: 10.1093/bioinformatics/btl363. Epub 2006 Jul 4.

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.使用自动自然语言处理技术对Wnt信号通路进行整理：结合统计方法与部分及完全句法分析进行知识提取。

Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.

Building a protein name dictionary from full text: a machine learning term extraction approach.从全文构建蛋白质名称词典：一种机器学习术语提取方法。

BMC Bioinformatics. 2005 Apr 7;6:88. doi: 10.1186/1471-2105-6-88.

Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称：一种机器学习方法。

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

Ranking the whole MEDLINE database according to a large training set using text indexing.使用文本索引根据一个大型训练集对整个MEDLINE数据库进行排名。

BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75.

Gene symbol disambiguation using knowledge-based profiles.使用基于知识的概况进行基因符号消歧。

Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.

Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22.

Comparison of character-level and part of speech features for name recognition in biomedical texts.生物医学文本中用于名称识别的字符级特征与词性特征比较。

J Biomed Inform. 2004 Dec;37(6):423-35. doi: 10.1016/j.jbi.2004.08.008.

Extraction of regulatory gene/protein networks from Medline.从医学在线数据库中提取调控基因/蛋白质网络。

Bioinformatics. 2006 Mar 15;22(6):645-50. doi: 10.1093/bioinformatics/bti597. Epub 2005 Jul 26.

引用本文的文献

Automatic extraction of protein-protein interactions using grammatical relationship graph.基于语法关系图自动提取蛋白质相互作用。

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):42. doi: 10.1186/s12911-018-0628-4.

Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述

Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.

The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation.CNVs 的重要性日益增加：检测和临床解读的新见解。

Front Genet. 2013 May 30;4:92. doi: 10.3389/fgene.2013.00092. eCollection 2013.

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.贝叶斯推断在基因组数据整合中减少了预测蛋白质-蛋白质相互作用的错误分类率。

PLoS Comput Biol. 2011 Jul;7(7):e1002110. doi: 10.1371/journal.pcbi.1002110. Epub 2011 Jul 28.

Using information mining of the medical literature to improve drug safety.利用医学文献的信息挖掘来提高药物安全性。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):668-74. doi: 10.1136/amiajnl-2011-000096. Epub 2011 May 5.

Extracting causal relations on HIV drug resistance from literature.从文献中提取 HIV 耐药性的因果关系。

BMC Bioinformatics. 2010 Feb 23;11:101. doi: 10.1186/1471-2105-11-101.

Bayesian inference of protein-protein interactions from biological literature.基于生物文献的蛋白质-蛋白质相互作用的贝叶斯推断

Bioinformatics. 2009 Jun 15;25(12):1536-42. doi: 10.1093/bioinformatics/btp245. Epub 2009 Apr 15.

DDESC: Dragon database for exploration of sodium channels in human.DDESC：人类钠通道探索的龙数据库。

BMC Genomics. 2008 Dec 20;9:622. doi: 10.1186/1471-2164-9-622.

Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data.规避注释瓶颈：利用序列相似性搜索非序列基因数据。

BMC Bioinformatics. 2008 Oct 17;9:442. doi: 10.1186/1471-2105-9-442.

Integrating protein-protein interactions and text mining for protein function prediction.整合蛋白质-蛋白质相互作用和文本挖掘进行蛋白质功能预测。

BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

文本挖掘算法的组合提高了性能。

Combination of text-mining algorithms increases the performance.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献