Delbecque Thierry, Zweigenbaum Pierre
LIMSI, CNRS, F-91403 Orsay, France.
AMIA Annu Symp Proc. 2010 Nov 13;2010:147-51.
Due to the large amount of new papers regularly entering the MEDLINE database, there is an ongoing effort to design tools that help indexing this new material. Here we investigate the hypothesis that past indexing information coming from referencing and authoring links can be used for this purpose. Using a JAMA-based subset of MEDLINE, we designed ranking scores which rely on this information; given a new article, the aim of these scores is to build an ordered list of MeSH terms that should be used to index this article. Evaluation measures on an independent, 1000-document data set are given. Comparison with equivalent works shows benefits in recall, F-measure and mean average precision. Moreover, cited articles and authors' past articles contribute to seven of the top ten ranking features, supporting our hypothesis. Further improvements and extensions to this work are exposed in the conclusion.
由于大量新论文定期进入MEDLINE数据库,人们一直在努力设计有助于对这些新材料进行索引的工具。在此,我们研究这样一个假设,即来自参考文献和作者链接的过往索引信息可用于此目的。我们使用基于《美国医学会杂志》的MEDLINE子集,设计了依赖于这些信息的排名分数;给定一篇新文章,这些分数的目的是构建一个应被用于索引该文章的医学主题词(MeSH)术语的有序列表。我们给出了在一个独立的、包含1000篇文献的数据集上的评估指标。与同等研究的比较表明,在召回率、F值和平均准确率方面有优势。此外,被引用文章和作者的过往文章对十大排名特征中的七个有贡献,支持了我们的假设。结论部分阐述了对这项工作的进一步改进和扩展。