Suppr超能文献

玩生物学的命名游戏:识别科学文本中的蛋白质名称。

Playing biology's name game: identifying protein names in scientific text.

作者信息

Hanisch Daniel, Fluck Juliane, Mevissen Heinz-Theodor, Zimmer Ralf

机构信息

Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, D-53754 Sankt Augustin, Germany.

出版信息

Pac Symp Biocomput. 2003:403-14.

Abstract

A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.

摘要

越来越多的工作致力于从科学文献中提取蛋白质或基因相互作用信息。然而,大多数提取算法的基础,即对蛋白质和基因名称及其众多同义词的特异性和敏感性识别,尚未得到充分解决。在此,我们描述了一个基于蛋白质名称简单令牌模型构建的综合通用名称词典以及相应的自动编目程序。我们设计了一种高效的搜索算法,以便在标准计算机上以合理的时间分析MEDLINE中的所有摘要。我们使用机器学习技术对方法的参数进行了优化。结合使用这些要素可带来良好的搜索性能。补充网页可在http://cartan.gmd.de/ProMiner/获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验