Demner-Fushman Dina, Mork James G
National Library of Medicine, National Institutes of Health, HHS Bethesda, MD, USA.
AMIA Annu Symp Proc. 2015 Nov 5;2015:484-91. eCollection 2015.
Characteristics of the subjects of biomedical research are important in determining if a publication describing the research is relevant to a search. To facilitate finding relevant publications, MEDLINE citations provide Medical Subject Headings that describe the subjects' characteristics, such as their species, gender, and age. We seek to improve the recommendation of these headings by the Medical Text Indexer (MTI) that supports manual indexing of MEDLINE. To that end, we explore the potential of the full text of the publications. Using simple recall-oriented rule-based methods we determined that adding sentences extracted from the methods sections and captions to the abstracts prior to MTI processing significantly improved recall and F1 score with only a slight drop in precision. Improvements were also achieved in directly assigning several headings extracted from the full text. These results indicate the need for further development of automated methods capable of leveraging the full text for indexing.
生物医学研究对象的特征对于确定描述该研究的出版物是否与检索相关很重要。为便于找到相关出版物,MEDLINE 引文提供了描述对象特征(如物种、性别和年龄)的医学主题词。我们试图改进支持 MEDLINE 手动索引的医学文本索引器(MTI)对这些主题词的推荐。为此,我们探索了出版物全文的潜力。使用简单的基于召回率的规则方法,我们确定在 MTI 处理之前将从方法部分和标题中提取的句子添加到摘要中,可显著提高召回率和 F1 分数,同时精度仅略有下降。在直接分配从全文中提取的几个主题词方面也取得了改进。这些结果表明需要进一步开发能够利用全文进行索引的自动化方法。