Suppr超能文献

深度医学主题词表:用于改进大规模医学主题词表索引的深度语义表示。

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

作者信息

Peng Shengwen, You Ronghui, Wang Hongning, Zhai Chengxiang, Mamitsuka Hiroshi, Zhu Shanfeng

机构信息

School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.

Department of Computer Science, University of Virginia, Charlottesville 22904-4740, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.

Abstract

MOTIVATION

Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.

METHODS

We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation.

RESULTS

DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.

AVAILABILITY AND IMPLEMENTATION

The software is available upon request.

CONTACT

zhusf@fudan.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

医学主题词表(MeSH)索引,即将一组MeSH主标题分配给文献,对于生物医学文本挖掘和信息检索中的许多重要任务至关重要。大规模的MeSH索引有两个具有挑战性的方面:文献方面和MeSH方面。在文献方面,所有现有方法,包括美国国立医学图书馆的医学文本索引器(MTI)和最先进的方法MeSHLabeler,都是通过词袋法处理文本,无法很好地捕捉语义和上下文相关信息。

方法

我们提出了DeepMeSH,它将深度语义信息纳入大规模MeSH索引。它解决了文献和MeSH两方面的挑战。文献方面的挑战通过一种新的深度语义表示D2V-TFIDF来解决,该表示将稀疏和密集语义表示连接起来。MeSH方面的挑战通过使用MeSHLabeler的“学习排序”框架来解决,该框架整合了从新语义表示生成的各种类型的证据。

结果

对于包含6000篇文献的BioASQ3挑战数据,DeepMeSH的微观F值为0.6323,比MeSHLabeler的0.6218高2%,比MTI的0.5637高12%。

可用性和实现

可根据要求提供软件。

联系方式

zhusf@fudan.edu.cn

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/557c/4908368/8be05ea4b2d1/btw294f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验