Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD.
AMIA Annu Symp Proc. 2021 Jan 25;2020:1031-1040. eCollection 2020.
This year less than 200 National Library of Medicine indexers expect to index 1 million articles, and this would not be possible without the assistance of the Medical Text Indexer (MTI) system. MTI is an automated indexing system that provides MeSH main heading/subheading pair recommendations to assist indexers with their heavy workload. Over the years, a lot of research effort has focused on improving main heading prediction performance, but automated fine-grained indexing with main heading/subheading pairs has received much less attention. This work revisits the subheading attachment problem, and demonstrates very significant performance improvements using modern Convolutional Neural Network classifiers. The best performing method is shown to outperform the current MTI implementation with a 3.7% absolute improvement in precision, and a 27.6% absolute improvement in recall. We also conducted a manual review of false positive predictions, and 70% were found to be acceptable indexing.
今年,不到 200 名美国国家医学图书馆标引员预计将对 100 万篇文章进行标引,如果没有 Medical Text Indexer (MTI) 系统的协助,这是不可能实现的。MTI 是一个自动化标引系统,提供 MeSH 主要标题/副标题对的建议,以帮助标引员完成繁重的工作。多年来,大量的研究工作都集中在提高主要标题预测性能上,但对使用主要标题/副标题对的自动化细粒度标引关注较少。这项工作重新审视了副标题附加问题,并使用现代卷积神经网络分类器证明了非常显著的性能改进。结果表明,表现最好的方法比当前的 MTI 实现提高了 3.7%的绝对精度,提高了 27.6%的绝对召回率。我们还对假阳性预测进行了手动审查,发现 70%的预测是可以接受的索引。