• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非叶状医学主题词自动分配至生物医学文章

Automatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles.

作者信息

Kavuluru Ramakanth, Rios Anthony

机构信息

Division of Biomedical Informatics, Department of Biostatistics, University of Kentucky; Department of Computer Science, University of Kentucky.

Department of Computer Science, University of Kentucky.

出版信息

AMIA Annu Symp Proc. 2015 Nov 5;2015:697-706. eCollection 2015.

PMID:26958205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4765689/
Abstract

Assigning labels from a hierarchical vocabulary is a well known special case of multi-label classification, often modeled to maximize micro F1-score. However, building accurate binary classifiers for poorly performing labels in the hierarchy can improve both micro and macro F1-scores. In this paper, we propose and evaluate classification strategies involving descendant node instances to build better binary classifiers for non-leaf labels with the use-case of assigning Medical Subject Headings (MeSH) to biomedical articles. Librarians at the National Library of Medicine tag each biomedical article to be indexed by their PubMed information system with terms from the MeSH terminology, a biomedical conceptual hierarchy with over 27,000 terms. Human indexers look at each article's full text to assign a set of most suitable MeSH terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Despite these attempts, it is observed that assigning MeSH terms corresponding to certain non-leaf nodes of the MeSH hierarchy is particularly challenging. Non-leaf nodes are very important as they constitute one third of the total number of MeSH terms. Here, we demonstrate the effectiveness of exploiting training examples of descendant terms of non-leaf nodes in improving the performance of conventional classifiers for the corresponding non-leaf MeSH terms. Specifically, we focus on reducing the false positives (FPs) caused due to descendant instances in traditional classifiers. Our methods are able to achieve a relative improvement of 7.5% in macro-F1 score while also increasing the micro-F1 score by 1.6% for a set of 500 non-leaf terms in the MeSH hierarchy. These results strongly indicate the critical role of incorporating hierarchical information in MeSH term prediction. To our knowledge, our effort is the first to demonstrate the role of hierarchical information in improving binary classifiers for non-leaf MeSH terms.

摘要

从分层词汇表中分配标签是多标签分类中一种众所周知的特殊情况,通常通过建模来最大化微观F1分数。然而,为层次结构中表现不佳的标签构建准确的二元分类器可以提高微观和宏观F1分数。在本文中,我们提出并评估了涉及后代节点实例的分类策略,以便在将医学主题词(MeSH)分配给生物医学文章的用例中,为非叶标签构建更好的二元分类器。美国国立医学图书馆的馆员使用MeSH术语(一个拥有超过27000个术语的生物医学概念层次结构)中的术语,为其PubMed信息系统要索引的每篇生物医学文章添加标签。人工索引员会查看每篇文章的全文,以分配一组最合适的MeSH术语来对其进行索引。最近的一些自动化尝试集中在使用文章标题和摘要文本为相应文章识别MeSH术语。尽管有这些尝试,但人们发现,为MeSH层次结构的某些非叶节点分配相应的MeSH术语特别具有挑战性。非叶节点非常重要,因为它们占MeSH术语总数的三分之一。在这里,我们展示了利用非叶节点后代术语的训练示例来提高传统分类器对相应非叶MeSH术语性能的有效性。具体来说,我们专注于减少传统分类器中由后代实例导致的误报(FP)。对于MeSH层次结构中的一组500个非叶术语,我们的方法能够在宏观F1分数上实现7.5%的相对提升,同时微观F1分数也提高了1.6%。这些结果有力地表明了在MeSH术语预测中纳入层次信息的关键作用。据我们所知,我们的工作是首次展示层次信息在改进非叶MeSH术语二元分类器中的作用。

相似文献

1
Automatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles.非叶状医学主题词自动分配至生物医学文章
AMIA Annu Symp Proc. 2015 Nov 5;2015:697-706. eCollection 2015.
2
Analyzing the Moving Parts of a Large-Scale Multi-Label Text Classification Pipeline: Experiences in Indexing Biomedical Articles.分析大规模多标签文本分类管道的各个组成部分:生物医学文章索引编制的经验
Proc (IEEE Int Conf Healthc Inform). 2015 Oct;2015:1-7. doi: 10.1109/ICHI.2015.6. Epub 2015 Dec 10.
3
Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications.使用输出标签共现统计和语义谓词的无监督医学主题词分配
Nat Lang Process Inf Syst. 2013 Jun;7934:176-188. doi: 10.1007/978-3-642-38824-8_15.
4
Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings.利用输出术语共现频率和潜在关联来预测医学主题词
Data Knowl Eng. 2014 Nov;94(B):189-201. doi: 10.1016/j.datak.2014.09.002. Epub 2014 Sep 18.
5
Quality of pharmacy-specific Medical Subject Headings (MeSH) assignment in pharmacy journals indexed in MEDLINE.MEDLINE收录的药学杂志中特定药学医学主题词(MeSH)标注的质量
Res Social Adm Pharm. 2015 Sep-Oct;11(5):686-95. doi: 10.1016/j.sapharm.2014.11.004. Epub 2014 Nov 22.
6
Semi-automatic indexing of full text biomedical articles.全文生物医学文献的半自动索引编制
AMIA Annu Symp Proc. 2005;2005:271-5.
7
Reflective random indexing for semi-automatic indexing of the biomedical literature.基于反射随机索引的生物医学文献半自动索引方法。
J Biomed Inform. 2010 Oct;43(5):694-700. doi: 10.1016/j.jbi.2010.04.001. Epub 2010 Apr 9.
8
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.
9
Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.用于生物医学文本分类的卷积神经网络:在生物医学文章索引中的应用
ACM BCB. 2015 Sep;2015:258-267. doi: 10.1145/2808719.2808746.
10
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.医学主题词表(MeSH)标注器:通过整合多种证据提高大规模医学主题词表索引的准确性。
Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

引用本文的文献

1
Amplifying Domain Expertise in Clinical Data Pipelines.增强临床数据管道中的领域专业知识。
JMIR Med Inform. 2020 Nov 5;8(11):e19612. doi: 10.2196/19612.

本文引用的文献

1
Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications.使用输出标签共现统计和语义谓词的无监督医学主题词分配
Nat Lang Process Inf Syst. 2013 Jun;7934:176-188. doi: 10.1007/978-3-642-38824-8_15.
2
Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings.利用输出术语共现频率和潜在关联来预测医学主题词
Data Knowl Eng. 2014 Nov;94(B):189-201. doi: 10.1016/j.datak.2014.09.002. Epub 2014 Sep 18.
3
Stochastic Gradient Descent and the Prediction of MeSH for PubMed Records.随机梯度下降与PubMed记录的医学主题词预测
AMIA Annu Symp Proc. 2014 Nov 14;2014:1198-207. eCollection 2014.
4
Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.
5
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
6
Reflective random indexing for semi-automatic indexing of the biomedical literature.基于反射随机索引的生物医学文献半自动索引方法。
J Biomed Inform. 2010 Oct;43(5):694-700. doi: 10.1016/j.jbi.2010.04.001. Epub 2010 Apr 9.
7
Optimal training sets for Bayesian prediction of MeSH assignment.用于医学主题词(MeSH)分配贝叶斯预测的最优训练集。
J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53. doi: 10.1197/jamia.M2431. Epub 2008 Apr 24.
8
The effect of feature representation on MEDLINE document classification.特征表示对医学文献数据库(MEDLINE)文档分类的影响。
AMIA Annu Symp Proc. 2005;2005:849-53.
9
Semi-automatic indexing of full text biomedical articles.全文生物医学文献的半自动索引编制
AMIA Annu Symp Proc. 2005;2005:271-5.
10
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.将分类器的输出调整为新的先验概率:一种简单方法。
Neural Comput. 2002 Jan;14(1):21-41. doi: 10.1162/089976602753284446.