Department of Biomedical Informatics, Arizona State University, Arizona, USA.
J Biomed Inform. 2010 Oct;43(5):694-700. doi: 10.1016/j.jbi.2010.04.001. Epub 2010 Apr 9.
The rapid growth of biomedical literature is evident in the increasing size of the MEDLINE research database. Medical Subject Headings (MeSH), a controlled set of keywords, are used to index all the citations contained in the database to facilitate search and retrieval. This volume of citations calls for efficient tools to assist indexers at the US National Library of Medicine (NLM). Currently, the Medical Text Indexer (MTI) system provides assistance by recommending MeSH terms based on the title and abstract of an article using a combination of distributional and vocabulary-based methods. In this paper, we evaluate a novel approach toward indexer assistance by using nearest neighbor classification in combination with Reflective Random Indexing (RRI), a scalable alternative to the established methods of distributional semantics. On a test set provided by the NLM, our approach significantly outperforms the MTI system, suggesting that the RRI approach would make a useful addition to the current methodologies.
生物医学文献的快速增长在 MEDLINE 研究数据库的不断扩大中显而易见。医学主题词 (MeSH) 是一套经过控制的关键词,用于对数据库中包含的所有引文进行索引,以方便搜索和检索。如此大量的引文需要有效的工具来协助美国国家医学图书馆 (NLM) 的编目人员。目前,医学文本索引器 (MTI) 系统通过使用基于分布和基于词汇的方法,根据文章的标题和摘要来推荐 MeSH 术语,从而提供帮助。在本文中,我们通过使用最近邻分类结合可扩展的反射随机索引 (RRI) 来评估一种新的编目人员辅助方法,这是对现有的分布语义方法的一种替代方法。在 NLM 提供的测试集中,我们的方法明显优于 MTI 系统,这表明 RRI 方法将成为当前方法的有用补充。