Suppr超能文献

在生物医学文章中将缩写词映射为全称。

Mapping abbreviations to full forms in biomedical articles.

作者信息

Yu Hong, Hripcsak George, Friedman Carol

机构信息

Department of Medical Informatics, Columbia University, New York, New York 10032, USA.

出版信息

J Am Med Inform Assoc. 2002 May-Jun;9(3):262-72. doi: 10.1197/jamia.m0913.

Abstract

OBJECTIVE

To develop methods that automatically map abbreviations to their full forms in biomedical articles.

METHODS

The authors developed two methods of mapping defined and undefined abbreviations (defined abbreviations are paired with their full forms in the articles, whereas undefined ones are not). For defined abbreviations, they developed a set of pattern-matching rules to map an abbreviation to its full form and implemented the rules into a software program, AbbRE (for "abbreviation recognition and extraction"). Using the opinions of domain experts as a reference standard, they evaluated the recall and precision of AbbRE for defined abbreviations in ten biomedical articles randomly selected from the ten most frequently cited medical and biological journals. They also measured the percentage of undefined abbreviations in the same set of articles, and they investigated whether they could map undefined abbreviations to any of four public abbreviation databases (GenBank LocusLink, SWISSPROT, LRABR of the UMLS Specialist Lexicon, and BioABACUS).

RESULTS

AbbRE had an average 0.70 recall and 0.95 precision for the defined abbreviations. The authors found that an average of 25 percent of abbreviations were defined in biomedical articles and that of a randomly selected subset of undefined abbreviations, 68 percent could be mapped to any of four abbreviation databases. They also found that many abbreviations are ambiguous (i.e., they map to more than one full form in abbreviation databases).

CONCLUSION

AbbRE is efficient for mapping defined abbreviations. To couple AbbRE with abbreviation databases for the mapping of undefined abbreviations, not only exhaustive abbreviation databases but also a method to resolve the ambiguity of abbreviations in the databases are needed.

摘要

目的

开发能在生物医学文章中自动将缩写词映射为其全称的方法。

方法

作者开发了两种映射已定义和未定义缩写词的方法(已定义缩写词在文章中与它们的全称配对,而未定义的则没有)。对于已定义缩写词,他们制定了一组模式匹配规则,将缩写词映射为其全称,并将这些规则实现到一个软件程序AbbRE(“缩写词识别与提取”)中。以领域专家的意见作为参考标准,他们评估了AbbRE对从十份最常被引用的医学和生物学杂志中随机选取的十篇生物医学文章中已定义缩写词的召回率和精确率。他们还测量了同一组文章中未定义缩写词的百分比,并研究是否能将未定义缩写词映射到四个公共缩写词数据库(GenBank LocusLink、SWISSPROT、UMLS专业词典的LRABR和BioABACUS)中的任何一个。

结果

AbbRE对已定义缩写词的平均召回率为0.70,精确率为0.95。作者发现,生物医学文章中平均25%的缩写词是已定义的,在随机选取的未定义缩写词子集中,68%可以映射到四个缩写词数据库中的任何一个。他们还发现许多缩写词是模糊的(即它们在缩写词数据库中映射到不止一个全称)。

结论

AbbRE在映射已定义缩写词方面效率较高。要将AbbRE与缩写词数据库结合用于映射未定义缩写词,不仅需要详尽的缩写词数据库,还需要一种解决数据库中缩写词模糊性问题的方法。

相似文献

1
Mapping abbreviations to full forms in biomedical articles.
J Am Med Inform Assoc. 2002 May-Jun;9(3):262-72. doi: 10.1197/jamia.m0913.
2
Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles.
J Biomed Inform. 2007 Apr;40(2):150-9. doi: 10.1016/j.jbi.2006.06.001. Epub 2006 Jun 7.
3
ALICE: an algorithm to extract abbreviations from MEDLINE.
J Am Med Inform Assoc. 2005 Sep-Oct;12(5):576-86. doi: 10.1197/jamia.M1757. Epub 2005 May 19.
4
ADAM: another database of abbreviations in MEDLINE.
Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.
5
Resolving abbreviations to their senses in Medline.
Bioinformatics. 2005 Sep 15;21(18):3658-64. doi: 10.1093/bioinformatics/bti586. Epub 2005 Jul 21.
6
A study of abbreviations in the UMLS.
Proc AMIA Symp. 2001:393-7.
7
Abbreviation definition identification based on automatic precision estimates.
BMC Bioinformatics. 2008 Sep 25;9:402. doi: 10.1186/1471-2105-9-402.
8
Creating an online dictionary of abbreviations from MEDLINE.
J Am Med Inform Assoc. 2002 Nov-Dec;9(6):612-20. doi: 10.1197/jamia.m1139.
9
[Developing and evaluating an auto-retrieval algorithm for abbreviations in academic articles].
Nihon Hoshasen Gijutsu Gakkai Zasshi. 2009 Aug 20;65(8):1025-31. doi: 10.6009/jjrt.65.1025.
10
Using UMLS lexical resources to disambiguate abbreviations in clinical text.
AMIA Annu Symp Proc. 2011;2011:715-22. Epub 2011 Oct 22.

引用本文的文献

1
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding.
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:1767-1781.
3
Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries.
BMC Bioinformatics. 2019 Aug 17;20(1):430. doi: 10.1186/s12859-019-3005-0.
5
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.
J Biomed Semantics. 2014 Feb 5;5(1):6. doi: 10.1186/2041-1480-5-6.
6
Machine learning with naturally labeled data for identifying abbreviation definitions.
BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-12-S3-S6.
8
Recent progress in automatically extracting information from the pharmacogenomic literature.
Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.
9
Automatically extracting information needs from complex clinical questions.
J Biomed Inform. 2010 Dec;43(6):962-71. doi: 10.1016/j.jbi.2010.07.007. Epub 2010 Jul 27.
10
Building a high-quality sense inventory for improved abbreviation disambiguation.
Bioinformatics. 2010 May 1;26(9):1246-53. doi: 10.1093/bioinformatics/btq129. Epub 2010 Mar 25.

本文引用的文献

1
Disambiguating proteins, genes, and RNA in text: a machine learning approach.
Bioinformatics. 2001;17 Suppl 1:S97-106. doi: 10.1093/bioinformatics/17.suppl_1.s97.
2
PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary.
Bioinformatics. 2000 Feb;16(2):169-75. doi: 10.1093/bioinformatics/16.2.169.
3
Introducing RefSeq and LocusLink: curated human genome resources at the NCBI.
Trends Genet. 2000 Jan;16(1):44-7. doi: 10.1016/s0168-9525(99)01882-x.
4
Abbreviations for invertebrate virus species names.
Arch Virol. 1999;144(11):2265-71. doi: 10.1007/s007050050642.
5
NCBI's LocusLink and RefSeq.
Nucleic Acids Res. 2000 Jan 1;28(1):126-8. doi: 10.1093/nar/28.1.126.
6
A short guide to abbreviations and their use in peptide science.
J Pept Sci. 1999 Nov;5(11):465-71. doi: 10.1002/(SICI)1099-1387(199911)5:11<465::AID-PSC224>3.0.CO;2-A.
7
The effect of abbreviations on MEDLINE searching.
Acad Emerg Med. 1999 Apr;6(4):292-6. doi: 10.1111/j.1553-2712.1999.tb00392.x.
8
Alphabet soup in outpatient clinics.
Ostomy Wound Manage. 1999 Feb;45(2):14.
9
Acronyms of clinical trials in cardiology--1998.
Am Heart J. 1999 Apr;137(4 Pt 1):726-65. doi: 10.1016/s0002-8703(99)70230-9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验