Suppr超能文献

用于生物医学文本分类的卷积神经网络:在生物医学文章索引中的应用

Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

作者信息

Rios Anthony, Kavuluru Ramakanth

机构信息

Department of Computer Science, University of Kentucky, Lexington, Kentucky.

Division of Biomedical Informatics, Depts. of Biostatistics and Computer Science, University of Kentucky, Lexington, Kentucky.

出版信息

ACM BCB. 2015 Sep;2015:258-267. doi: 10.1145/2808719.2808746.

Abstract

Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.

摘要

鉴于诸如研究文章和临床文档等非结构化叙述中隐藏着丰富的信息,构建高精度文本分类器是生物医学中的一项重要任务。由于特征空间较大,传统上,诸如逻辑回归和具有n元语法和语义特征(例如命名实体)的支持向量机等判别方法已被用于文本分类,其中通常通过特征选择和集成方法来进一步提高性能。在本文中,我们证明了一种使用卷积神经网络(CNN)的更直接方法在生物医学文本分类中优于几种传统方法,具体应用案例是为生物医学文章分配医学主题词(或MeSH词)。美国国立医学图书馆(NLM)的专业注释人员平均为每篇生物医学文章分配13个代码,从而对科学文献进行语义索引,以支持NLM的PubMed搜索系统。最近的证据表明,有效的MeSH词自动分配工作从每个词的二元分类器开始。在本文中,我们使用CNN构建二元文本分类器,与公共数据集上之前的最佳结果相比,在一组选定的难以分类的MeSH词上,宏观F值绝对提高了3%以上。对数据集中50个高频词的额外实验也显示了CNN的改进效果。我们的结果表明CNN在生物医学文本分类任务中具有强大的潜力。

相似文献

引用本文的文献

本文引用的文献

6
Learning regular expressions for clinical text classification.学习正则表达式进行临床文本分类。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7. doi: 10.1136/amiajnl-2013-002411. Epub 2014 Feb 27.
8
Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.
9
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
10
Optimal training sets for Bayesian prediction of MeSH assignment.用于医学主题词(MeSH)分配贝叶斯预测的最优训练集。
J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53. doi: 10.1197/jamia.M2431. Epub 2008 Apr 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验