Peng Shengwen, You Ronghui, Wang Hongning, Zhai Chengxiang, Mamitsuka Hiroshi, Zhu Shanfeng
School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.
Department of Computer Science, University of Virginia, Charlottesville 22904-4740, USA.
Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.
Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.
We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation.
DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.
The software is available upon request.
Supplementary data are available at Bioinformatics online.
医学主题词表(MeSH)索引,即将一组MeSH主标题分配给文献,对于生物医学文本挖掘和信息检索中的许多重要任务至关重要。大规模的MeSH索引有两个具有挑战性的方面:文献方面和MeSH方面。在文献方面,所有现有方法,包括美国国立医学图书馆的医学文本索引器(MTI)和最先进的方法MeSHLabeler,都是通过词袋法处理文本,无法很好地捕捉语义和上下文相关信息。
我们提出了DeepMeSH,它将深度语义信息纳入大规模MeSH索引。它解决了文献和MeSH两方面的挑战。文献方面的挑战通过一种新的深度语义表示D2V-TFIDF来解决,该表示将稀疏和密集语义表示连接起来。MeSH方面的挑战通过使用MeSHLabeler的“学习排序”框架来解决,该框架整合了从新语义表示生成的各种类型的证据。
对于包含6000篇文献的BioASQ3挑战数据,DeepMeSH的微观F值为0.6323,比MeSHLabeler的0.6218高2%,比MTI的0.5637高12%。
可根据要求提供软件。
补充数据可在《生物信息学》在线获取。