在生物医学文章中将缩写词映射为全称。

Mapping abbreviations to full forms in biomedical articles.

作者信息

Yu Hong, Hripcsak George, Friedman Carol

机构信息

Department of Medical Informatics, Columbia University, New York, New York 10032, USA.

出版信息

J Am Med Inform Assoc. 2002 May-Jun;9(3):262-72. doi: 10.1197/jamia.m0913.

DOI:10.1197/jamia.m0913

PMID:11971887

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC344586/

Abstract

OBJECTIVE

To develop methods that automatically map abbreviations to their full forms in biomedical articles.

METHODS

The authors developed two methods of mapping defined and undefined abbreviations (defined abbreviations are paired with their full forms in the articles, whereas undefined ones are not). For defined abbreviations, they developed a set of pattern-matching rules to map an abbreviation to its full form and implemented the rules into a software program, AbbRE (for "abbreviation recognition and extraction"). Using the opinions of domain experts as a reference standard, they evaluated the recall and precision of AbbRE for defined abbreviations in ten biomedical articles randomly selected from the ten most frequently cited medical and biological journals. They also measured the percentage of undefined abbreviations in the same set of articles, and they investigated whether they could map undefined abbreviations to any of four public abbreviation databases (GenBank LocusLink, SWISSPROT, LRABR of the UMLS Specialist Lexicon, and BioABACUS).

RESULTS

AbbRE had an average 0.70 recall and 0.95 precision for the defined abbreviations. The authors found that an average of 25 percent of abbreviations were defined in biomedical articles and that of a randomly selected subset of undefined abbreviations, 68 percent could be mapped to any of four abbreviation databases. They also found that many abbreviations are ambiguous (i.e., they map to more than one full form in abbreviation databases).

CONCLUSION

AbbRE is efficient for mapping defined abbreviations. To couple AbbRE with abbreviation databases for the mapping of undefined abbreviations, not only exhaustive abbreviation databases but also a method to resolve the ambiguity of abbreviations in the databases are needed.

摘要

目的

开发能在生物医学文章中自动将缩写词映射为其全称的方法。

方法

作者开发了两种映射已定义和未定义缩写词的方法（已定义缩写词在文章中与它们的全称配对，而未定义的则没有）。对于已定义缩写词，他们制定了一组模式匹配规则，将缩写词映射为其全称，并将这些规则实现到一个软件程序AbbRE（“缩写词识别与提取”）中。以领域专家的意见作为参考标准，他们评估了AbbRE对从十份最常被引用的医学和生物学杂志中随机选取的十篇生物医学文章中已定义缩写词的召回率和精确率。他们还测量了同一组文章中未定义缩写词的百分比，并研究是否能将未定义缩写词映射到四个公共缩写词数据库（GenBank LocusLink、SWISSPROT、UMLS专业词典的LRABR和BioABACUS）中的任何一个。

结果

AbbRE对已定义缩写词的平均召回率为0.70，精确率为0.95。作者发现，生物医学文章中平均25%的缩写词是已定义的，在随机选取的未定义缩写词子集中，68%可以映射到四个缩写词数据库中的任何一个。他们还发现许多缩写词是模糊的（即它们在缩写词数据库中映射到不止一个全称）。

结论

AbbRE在映射已定义缩写词方面效率较高。要将AbbRE与缩写词数据库结合用于映射未定义缩写词，不仅需要详尽的缩写词数据库，还需要一种解决数据库中缩写词模糊性问题的方法。

相似文献

Mapping abbreviations to full forms in biomedical articles.在生物医学文章中将缩写词映射为全称。

J Am Med Inform Assoc. 2002 May-Jun;9(3):262-72. doi: 10.1197/jamia.m0913.

Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles.使用MEDLINE作为知识来源来消除全文生物医学期刊文章中缩写词和首字母缩略词的歧义。

J Biomed Inform. 2007 Apr;40(2):150-9. doi: 10.1016/j.jbi.2006.06.001. Epub 2006 Jun 7.

ALICE: an algorithm to extract abbreviations from MEDLINE.ALICE：一种从医学文献数据库（MEDLINE）中提取缩写词的算法。

J Am Med Inform Assoc. 2005 Sep-Oct;12(5):576-86. doi: 10.1197/jamia.M1757. Epub 2005 May 19.

ADAM: another database of abbreviations in MEDLINE.ADAM：医学在线数据库（MEDLINE）中的另一个缩写词数据库。

Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.

Resolving abbreviations to their senses in Medline.在医学文献数据库（Medline）中解析缩写词的含义。

Bioinformatics. 2005 Sep 15;21(18):3658-64. doi: 10.1093/bioinformatics/bti586. Epub 2005 Jul 21.

A study of abbreviations in the UMLS.一项关于统一医学语言系统（UMLS）中缩写词的研究。

Proc AMIA Symp. 2001:393-7.

Abbreviation definition identification based on automatic precision estimates.基于自动精度估计的缩写定义识别。

BMC Bioinformatics. 2008 Sep 25;9:402. doi: 10.1186/1471-2105-9-402.

Creating an online dictionary of abbreviations from MEDLINE.创建一个来自医学文献数据库（MEDLINE）的缩写在线词典。

J Am Med Inform Assoc. 2002 Nov-Dec;9(6):612-20. doi: 10.1197/jamia.m1139.

[Developing and evaluating an auto-retrieval algorithm for abbreviations in academic articles].[开发和评估一种用于学术文章中缩写词的自动检索算法]

Nihon Hoshasen Gijutsu Gakkai Zasshi. 2009 Aug 20;65(8):1025-31. doi: 10.6009/jjrt.65.1025.

Using UMLS lexical resources to disambiguate abbreviations in clinical text.利用统一医学语言系统（UMLS）词汇资源消除临床文本中的缩写歧义。

AMIA Annu Symp Proc. 2011;2011:715-22. Epub 2011 Oct 22.

引用本文的文献

Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding.基于知识注入提示的多标签少样本ICD编码微调

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:1767-1781.

Predicting emergency department orders with multilabel machine learning techniques and simulating effects on length of stay.使用多标签机器学习技术预测急诊科医嘱并模拟对住院时间的影响。

J Am Med Inform Assoc. 2019 Dec 1;26(12):1427-1436. doi: 10.1093/jamia/ocz171.

Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries.基于维基百科在出院小结中对解剖相关实体与人体部位的映射。

BMC Bioinformatics. 2019 Aug 17;20(1):430. doi: 10.1186/s12859-019-3005-0.

Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease.自动PDF高亮显示，以支持更快地整理帕金森病和阿尔茨海默病的文献。

Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax027.

Synonym extraction and abbreviation expansion with ensembles of semantic spaces.使用语义空间集合进行同义词提取和缩写扩展。

J Biomed Semantics. 2014 Feb 5;5(1):6. doi: 10.1186/2041-1480-5-6.

Machine learning with naturally labeled data for identifying abbreviation definitions.基于自然标注数据的机器学习在缩写词定义识别中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-12-S3-S6.

Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion.将全文生物医学文章中的句子自动分类为引言、方法、结果和讨论部分。

Summit Transl Bioinform. 2009 Mar 1;2009:6-10.

Recent progress in automatically extracting information from the pharmacogenomic literature.从药物基因组学文献中自动提取信息的最新进展。

Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.

Automatically extracting information needs from complex clinical questions.从复杂的临床问题中自动提取信息需求。

J Biomed Inform. 2010 Dec;43(6):962-71. doi: 10.1016/j.jbi.2010.07.007. Epub 2010 Jul 27.

Building a high-quality sense inventory for improved abbreviation disambiguation.构建高质量的感观词库以提高缩写词消歧

Bioinformatics. 2010 May 1;26(9):1246-53. doi: 10.1093/bioinformatics/btq129. Epub 2010 Mar 25.

本文引用的文献

Disambiguating proteins, genes, and RNA in text: a machine learning approach.文本中蛋白质、基因和RNA的消歧：一种机器学习方法。

Bioinformatics. 2001;17 Suppl 1:S97-106. doi: 10.1093/bioinformatics/17.suppl_1.s97.

PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary.PNAD-CSS：一个用于构建蛋白质名称缩写词典的工作台。

Bioinformatics. 2000 Feb;16(2):169-75. doi: 10.1093/bioinformatics/16.2.169.

Introducing RefSeq and LocusLink: curated human genome resources at the NCBI.介绍RefSeq和LocusLink：美国国立医学图书馆国家生物技术信息中心（NCBI）的人类基因组资源精选库。

Trends Genet. 2000 Jan;16(1):44-7. doi: 10.1016/s0168-9525(99)01882-x.

Abbreviations for invertebrate virus species names.无脊椎动物病毒物种名称的缩写。

Arch Virol. 1999;144(11):2265-71. doi: 10.1007/s007050050642.

NCBI's LocusLink and RefSeq.美国国立生物技术信息中心的基因座链接数据库和参考序列数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):126-8. doi: 10.1093/nar/28.1.126.

A short guide to abbreviations and their use in peptide science.肽科学中缩写词及其用法简短指南。

J Pept Sci. 1999 Nov;5(11):465-71. doi: 10.1002/(SICI)1099-1387(199911)5:11<465::AID-PSC224>3.0.CO;2-A.

The effect of abbreviations on MEDLINE searching.缩写词对医学文献数据库检索的影响。

Acad Emerg Med. 1999 Apr;6(4):292-6. doi: 10.1111/j.1553-2712.1999.tb00392.x.

Alphabet soup in outpatient clinics.门诊中的字母组合词。

Ostomy Wound Manage. 1999 Feb;45(2):14.

Acronyms of clinical trials in cardiology--1998.1998年心脏病学临床试验的首字母缩略词

Am Heart J. 1999 Apr;137(4 Pt 1):726-65. doi: 10.1016/s0002-8703(99)70230-9.

Updating a bibliography using the related articles function within PubMed.使用PubMed中的相关文章功能更新参考文献目录。

Proc AMIA Symp. 1998:750-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验