文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Schuemie Martijn J, Mons Barend, Weeber Marc, Kors Jan A

Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands.

J Biomed Inform. 2007 Jun;40(3):316-24. doi: 10.1016/j.jbi.2006.09.002. Epub 2006 Sep 24.

Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

文本中基因和蛋白质名称的识别需要采用字典方法，将同义词关联到同一基因或蛋白质，并将名称链接到外部数据库。然而，现有的字典并不完整。我们研究了两种用于自动生成综合字典的互补方法：整合来自现有基因和蛋白质数据库的信息以及基于规则生成拼写变体。这两种方法之前都在文献中有所报道，但迄今为止尚未进行系统的组合和评估。我们整合了来自四种不同生物体的几个现有数据库中的基因和蛋白质名称。与任何单个数据库相比，整合后的字典在三个不同测试集上的召回率有显著提高。将23条拼写变体规则应用于整合后的字典进一步提高了召回率。然而，许多规则似乎没有效果，有些规则似乎对精确率有不利影响。

Schuemie Martijn J, Mons Barend, Weeber Marc, Kors Jan A

Department of Medical Informatics, Erasmus University Medical Center Rotterdam, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands.

J Biomed Inform. 2007 Jun;40(3):316-24. doi: 10.1016/j.jbi.2006.09.002. Epub 2006 Sep 24.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在用于基因和蛋白质名称识别的字典方法中提高召回率的技术评估。

Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

在用于基因和蛋白质名称识别的字典方法中提高召回率的技术评估。

Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献