Yu Hong, Hripcsak George, Friedman Carol
Department of Medical Informatics, Columbia University, New York, New York 10032, USA.
J Am Med Inform Assoc. 2002 May-Jun;9(3):262-72. doi: 10.1197/jamia.m0913.
To develop methods that automatically map abbreviations to their full forms in biomedical articles.
The authors developed two methods of mapping defined and undefined abbreviations (defined abbreviations are paired with their full forms in the articles, whereas undefined ones are not). For defined abbreviations, they developed a set of pattern-matching rules to map an abbreviation to its full form and implemented the rules into a software program, AbbRE (for "abbreviation recognition and extraction"). Using the opinions of domain experts as a reference standard, they evaluated the recall and precision of AbbRE for defined abbreviations in ten biomedical articles randomly selected from the ten most frequently cited medical and biological journals. They also measured the percentage of undefined abbreviations in the same set of articles, and they investigated whether they could map undefined abbreviations to any of four public abbreviation databases (GenBank LocusLink, SWISSPROT, LRABR of the UMLS Specialist Lexicon, and BioABACUS).
AbbRE had an average 0.70 recall and 0.95 precision for the defined abbreviations. The authors found that an average of 25 percent of abbreviations were defined in biomedical articles and that of a randomly selected subset of undefined abbreviations, 68 percent could be mapped to any of four abbreviation databases. They also found that many abbreviations are ambiguous (i.e., they map to more than one full form in abbreviation databases).
AbbRE is efficient for mapping defined abbreviations. To couple AbbRE with abbreviation databases for the mapping of undefined abbreviations, not only exhaustive abbreviation databases but also a method to resolve the ambiguity of abbreviations in the databases are needed.
开发能在生物医学文章中自动将缩写词映射为其全称的方法。
作者开发了两种映射已定义和未定义缩写词的方法(已定义缩写词在文章中与它们的全称配对,而未定义的则没有)。对于已定义缩写词,他们制定了一组模式匹配规则,将缩写词映射为其全称,并将这些规则实现到一个软件程序AbbRE(“缩写词识别与提取”)中。以领域专家的意见作为参考标准,他们评估了AbbRE对从十份最常被引用的医学和生物学杂志中随机选取的十篇生物医学文章中已定义缩写词的召回率和精确率。他们还测量了同一组文章中未定义缩写词的百分比,并研究是否能将未定义缩写词映射到四个公共缩写词数据库(GenBank LocusLink、SWISSPROT、UMLS专业词典的LRABR和BioABACUS)中的任何一个。
AbbRE对已定义缩写词的平均召回率为0.70,精确率为0.95。作者发现,生物医学文章中平均25%的缩写词是已定义的,在随机选取的未定义缩写词子集中,68%可以映射到四个缩写词数据库中的任何一个。他们还发现许多缩写词是模糊的(即它们在缩写词数据库中映射到不止一个全称)。
AbbRE在映射已定义缩写词方面效率较高。要将AbbRE与缩写词数据库结合用于映射未定义缩写词,不仅需要详尽的缩写词数据库,还需要一种解决数据库中缩写词模糊性问题的方法。