Mao Yuqing, Lu Zhiyong
Nanjing University of Chinese Medicine, 138 Xianlin Avenue, Nanjing, Jiangsu, 210023, China.
National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, MD, 20894, USA.
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.
MeSH indexing is the task of assigning relevant MeSH terms based on a manual reading of scholarly publications by human indexers. The task is highly important for improving literature retrieval and many other scientific investigations in biomedical research. Unfortunately, given its manual nature, the process of MeSH indexing is both time-consuming (new articles are not immediately indexed until 2 or 3 months later) and costly (approximately ten dollars per article). In response, automatic indexing by computers has been previously proposed and attempted but remains challenging. In order to advance the state of the art in automatic MeSH indexing, a community-wide shared task called BioASQ was recently organized.
We propose MeSH Now, an integrated approach that first uses multiple strategies to generate a combined list of candidate MeSH terms for a target article. Through a novel learning-to-rank framework, MeSH Now then ranks the list of candidate terms based on their relevance to the target article. Finally, MeSH Now selects the highest-ranked MeSH terms via a post-processing module.
We assessed MeSH Now on two separate benchmarking datasets using traditional precision, recall and F-score metrics. In both evaluations, MeSH Now consistently achieved over 0.60 in F-score, ranging from 0.610 to 0.612. Furthermore, additional experiments show that MeSH Now can be optimized by parallel computing in order to process MEDLINE documents on a large scale.
We conclude that MeSH Now is a robust approach with state-of-the-art performance for automatic MeSH indexing and that MeSH Now is capable of processing PubMed scale documents within a reasonable time frame.
医学主题词(MeSH)标引是人工标引员通过人工阅读学术出版物来分配相关MeSH词的任务。该任务对于改善生物医学研究中的文献检索及许多其他科学研究非常重要。不幸的是,鉴于其人工性质,MeSH标引过程既耗时(新文章在2至3个月后才会被立即标引)又昂贵(每篇文章约10美元)。作为回应,此前已提出并尝试通过计算机进行自动标引,但仍具有挑战性。为了推动自动MeSH标引技术的发展,最近组织了一项名为BioASQ的全社区共享任务。
我们提出了MeSH Now,这是一种综合方法,首先使用多种策略为目标文章生成候选MeSH词的组合列表。然后,通过一个新颖的排序学习框架,MeSH Now根据候选词与目标文章的相关性对其列表进行排序。最后,MeSH Now通过后处理模块选择排名最高的MeSH词。
我们使用传统的精确率、召回率和F值指标在两个单独的基准数据集上评估了MeSH Now。在这两项评估中,MeSH Now的F值始终超过0.60,范围从0.610到0.612。此外,额外的实验表明,MeSH Now可以通过并行计算进行优化,以便大规模处理MEDLINE文档。
我们得出结论,MeSH Now是一种强大的自动MeSH标引方法,具有先进的性能,并且MeSH Now能够在合理的时间范围内处理PubMed规模的文档。