Mork James, Aronson Alan, Demner-Fushman Dina
US National Library of Medicine, 8600 Rockville Pike, Bethesda, USA.
J Biomed Semantics. 2017 Feb 23;8(1):8. doi: 10.1186/s13326-017-0113-5.
Facing a growing workload and dwindling resources, the US National Library of Medicine (NLM) created the Indexing Initiative project in 1996. This cross-library team's mission is to explore indexing methodologies for ensuring quality and currency of NLM document collections. The NLM Medical Text Indexer (MTI) is the main product of this project and has been providing automated indexing recommendations since 2002. After all of this time, the questions arise whether MTI is still useful and relevant.
To answer the question about MTI usefulness, we track a wide variety of statistics related to how frequently MEDLINE indexers refer to MTI recommendations, how well MTI performs against human indexing, and how often MTI is used. To answer the question of MTI relevancy compared to other available tools, we have participated in the 2013 and 2014 BioASQ Challenges. The BioASQ Challenges have provided us with an unbiased comparison between the MTI system and other systems performing the same task.
Indexers have continually increased their use of MTI recommendations over the years from 15.75% of the articles they index in 2002 to 62.44% in 2014 showing that the indexers find MTI to be increasingly useful. The MTI performance statistics show significant improvement in Precision (+0.2992) and F (+0.1997) with modest gains in Recall (+0.0454) over the years. MTI consistency is comparable to the available indexer consistency studies. MTI performed well in both of the BioASQ Challenges ranking within the top tier teams.
Based on our findings, yes, MTI is still relevant and useful, and needs to be improved and expanded. The BioASQ Challenge results have shown that we need to incorporate more machine learning into MTI while still retaining the indexing rules that have earned MTI the indexers' trust over the years. We also need to expand MTI through the use of full text, when and where it is available, to provide coverage of indexing terms that are typically only found in the full text. The role of MTI at NLM is also expanding into new areas, further reinforcing the idea that MTI is increasingly useful and relevant.
面对工作量不断增加和资源日益减少的情况,美国国立医学图书馆(NLM)于1996年创建了索引倡议项目。这个跨图书馆团队的任务是探索索引方法,以确保NLM文献收藏的质量和时效性。NLM医学文本索引器(MTI)是该项目的主要产品,自2002年以来一直提供自动索引建议。经过这么长时间,出现了MTI是否仍然有用和相关的问题。
为了回答关于MTI有用性的问题,我们跟踪了各种统计数据,包括MEDLINE索引人员参考MTI建议的频率、MTI与人工索引相比的表现以及MTI的使用频率。为了回答与其他可用工具相比MTI相关性的问题,我们参加了2013年和2014年的生物医学语义问答挑战(BioASQ Challenges)。生物医学语义问答挑战为我们提供了MTI系统与执行相同任务的其他系统之间的公正比较。
多年来,索引人员对MTI建议的使用不断增加,从2002年他们索引文章的15.75%增加到2014年的62.44%,这表明索引人员发现MTI越来越有用。MTI的性能统计数据显示,多年来精确率(+0.2992)和F值(+0.1997)有显著提高,召回率略有提高(+0.0454)。MTI的一致性与现有的索引人员一致性研究相当。MTI在两次生物医学语义问答挑战中都表现出色,跻身顶级团队之列。
根据我们的研究结果,MTI仍然是相关且有用的,并且需要改进和扩展。生物医学语义问答挑战的结果表明,我们需要在MTI中融入更多机器学习,同时保留那些多年来赢得索引人员信任的索引规则。我们还需要在有全文可用时和可用处通过使用全文来扩展MTI,以涵盖通常仅在全文中出现的索引词。MTI在NLM中的作用也在扩展到新的领域,进一步强化了MTI越来越有用和相关的观点。