Ao Hiroko, Takagi Toshihisa
Department of Computational Biology, University of Tokyo CB01, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8561, Japan.
J Am Med Inform Assoc. 2005 Sep-Oct;12(5):576-86. doi: 10.1197/jamia.M1757. Epub 2005 May 19.
To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly.
ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules.
It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database.
ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.
为帮助生物医学研究人员识别生物医学文献中动态引入的缩写,如基因和蛋白质名称,我们构建了一个名为ALICE(基于语料库提取的缩写提升器)的支持系统。ALICE旨在即时从目标论文中提取各类缩写及其全称。
ALICE通过使用启发式模式匹配规则从文献中提取缩写及其全称。该系统由三个阶段组成,作为规则的组合,可能识别出320种有效的缩写-全称模式。
在从MEDLINE数据库中随机选择的标题和摘要上,它实现了95%的召回率和97%的精确率。
ALICE能有效地从文献中提取缩写及其全称。精心编制的启发式方法使其能够在不显著降低精确率的情况下,以高召回率提取缩写。ALICE不仅通过构建缩写数据库或词典方便识别论文中未定义的缩写,还能使生物医学文献检索更准确。该系统可从http://uvdb3.hgc.jp/ALICE/ALICE_index.html免费获取。