Suppr超能文献

LocusLink和MEDLINE中人类基因符号的歧义性:创建清单和消歧测试集。

Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection.

作者信息

Weeber Marc, Schijvenaars Bob J, Van Mulligen Erik M, Mons Barend, Jelier Rob, Van Der Eijk Christian C, Kors Jan A

机构信息

Department of Medical Informatics, Erasmus MC, 3000 DR Rotterdam, The Netherlands.

出版信息

AMIA Annu Symp Proc. 2003;2003:704-8.

Abstract

Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol often refers to more than one gene. Using an existing abbreviation expansion algorithm,we explore MEDLINE for the use of human gene symbols derived from LocusLink. It turns out that just over 40% of these symbols occur in MEDLINE, however, many of these occurrences are not related to genes. Along the process of making an inventory, a disambiguation test collection is constructed automatically.

摘要

基因几乎每天都有新发现,因此必须为其寻找新名称。尽管有基因命名的指导原则,但命名过程极具创造性。人类基因通常由基因符号和一个更长、更具描述性的术语来命名;缩写形式往往是较长形式的缩写。生物医学语言中的缩写非常模糊,也就是说,一个基因符号常常指代不止一个基因。利用现有的缩写扩展算法,我们在医学文献数据库(MEDLINE)中搜索源自位点链接(LocusLink)的人类基因符号的使用情况。结果发现,这些符号中略多于40%出现在医学文献数据库中,然而,其中许多出现的情况与基因并无关联。在编制清单的过程中,会自动构建一个消歧测试集。

相似文献

2
Thesaurus-based disambiguation of gene symbols.基于词库的基因符号消歧
BMC Bioinformatics. 2005 Jun 16;6:149. doi: 10.1186/1471-2105-6-149.
3
Gene symbol disambiguation using knowledge-based profiles.使用基于知识的概况进行基因符号消歧。
Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.
5
Link-topic model for biomedical abbreviation disambiguation.用于生物医学缩写词消歧的链接主题模型
J Biomed Inform. 2015 Feb;53:367-80. doi: 10.1016/j.jbi.2014.12.013. Epub 2014 Dec 30.
8
SaRAD: a Simple and Robust Abbreviation Dictionary.SaRAD:一个简单且强大的缩写词典。
Bioinformatics. 2004 Mar 1;20(4):527-33. doi: 10.1093/bioinformatics/btg439. Epub 2004 Jan 22.
9
Disambiguation in the biomedical domain: the role of ambiguity type.生物医学领域的消歧:歧义类型的作用。
J Biomed Inform. 2010 Dec;43(6):972-81. doi: 10.1016/j.jbi.2010.08.009. Epub 2010 Sep 9.

引用本文的文献

1
The effect of word sense disambiguation accuracy on literature based discovery.词义消歧准确性对基于文献的发现的影响。
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):57. doi: 10.1186/s12911-016-0296-1.
7
9
Retrieval with gene queries.使用基因查询进行检索。
BMC Bioinformatics. 2006 Apr 21;7:220. doi: 10.1186/1471-2105-7-220.
10
Thesaurus-based disambiguation of gene symbols.基于词库的基因符号消歧
BMC Bioinformatics. 2005 Jun 16;6:149. doi: 10.1186/1471-2105-6-149.

本文引用的文献

7
Tagging gene and protein names in biomedical text.在生物医学文本中标记基因和蛋白质名称。
Bioinformatics. 2002 Aug;18(8):1124-32. doi: 10.1093/bioinformatics/18.8.1124.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验