• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过多标签分类模型集成对生物医学文章进行大规模在线语义索引。

Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models.

作者信息

Papanikolaou Yannis, Tsoumakas Grigorios, Laliotis Manos, Markantonatos Nikos, Vlahavas Ioannis

机构信息

Department of Computer Science, Aristotle University, Thessaloniki, 54124, Greece.

Atypon, 5201 Great America Parkway Suite 510, Santa Clara, 95054, CA, USA.

出版信息

J Biomed Semantics. 2017 Sep 22;8(1):43. doi: 10.1186/s13326-017-0150-0.

DOI:10.1186/s13326-017-0150-0
PMID:28938902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5610407/
Abstract

BACKGROUND

In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013-2017), a challenge concerned with biomedical semantic indexing and question answering.

METHODS

Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper parametrization of the algorithms used to deal with this challenging classification task.

RESULTS

The ensemble method that we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. In our participation in the BioASQ challenge we obtained the first place in 2013 and the second place in the four following years, steadily outperforming MTI, the indexing system of the National Library of Medicine (NLM).

CONCLUSIONS

The results of our experimental comparisons, suggest that employing a statistical significance test to validate the ensemble method's choices, is the optimal approach for ensembling multi-label classifiers, especially in contexts with many rare labels.

摘要

背景

在本文中,我们介绍了用于处理生物医学论文大规模多标签语义索引的方法。这项工作主要是在BioASQ挑战赛(2013 - 2017年)的背景下实施的,该挑战赛涉及生物医学语义索引和问答。

方法

我们的主要贡献是一种多标签集成方法(MULE),它纳入了麦克尼马尔统计显著性检验,以验证组成机器学习算法的组合。一些次要贡献包括对BioASQ语料库时间方面的研究(观察结果也适用于BioASQ的超集,即PubMed文章集合)以及用于处理这一具有挑战性分类任务的算法的适当参数化。

结果

我们开发的集成方法在使用BioASQ语料库子集的实验场景中与其他方法进行了比较,取得了积极成果。在我们参与BioASQ挑战赛的过程中,我们在2013年获得了第一名,并在随后的四年中获得了第二名,持续优于美国国立医学图书馆(NLM)的索引系统MTI。

结论

我们实验比较的结果表明,采用统计显著性检验来验证集成方法的选择,是集成多标签分类器的最佳方法,特别是在有许多稀有标签的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/1a3b6ff53ea2/13326_2017_150_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/5557275fa2b1/13326_2017_150_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/a325e1826f8b/13326_2017_150_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/a9d01a265e33/13326_2017_150_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/874318cc8e03/13326_2017_150_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/1a3b6ff53ea2/13326_2017_150_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/5557275fa2b1/13326_2017_150_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/a325e1826f8b/13326_2017_150_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/a9d01a265e33/13326_2017_150_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/874318cc8e03/13326_2017_150_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3690/5610407/1a3b6ff53ea2/13326_2017_150_Fig5_HTML.jpg

相似文献

1
Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models.通过多标签分类模型集成对生物医学文章进行大规模在线语义索引。
J Biomed Semantics. 2017 Sep 22;8(1):43. doi: 10.1186/s13326-017-0150-0.
2
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.
3
Large scale biomedical texts classification: a kNN and an ESA-based approaches.大规模生物医学文本分类:基于k近邻算法和基于词嵌入语义分析的方法。
J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.
4
Biomedical semantic indexing by deep neural network with multi-task learning.基于多任务学习的深度神经网络生物医学语义索引
BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):502. doi: 10.1186/s12859-018-2534-2.
5
12 years on - Is the NLM medical text indexer still useful and relevant?十二年过去了——国立医学图书馆医学文本索引工具仍然有用吗?它还适用吗?
J Biomed Semantics. 2017 Feb 23;8(1):8. doi: 10.1186/s13326-017-0113-5.
6
Multi-label biomedical question classification for lexical answer type prediction.多标签生物医学问题分类用于词汇答案类型预测。
J Biomed Inform. 2019 May;93:103143. doi: 10.1016/j.jbi.2019.103143. Epub 2019 Mar 12.
7
MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.医学主题词标注器与深度医学主题词:大规模医学主题词标引的最新进展
Methods Mol Biol. 2018;1807:203-209. doi: 10.1007/978-1-4939-8561-6_15.
8
A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering.一种基于机器学习的生物医学问答中问题类型分类方法。
Methods Inf Med. 2017 May 18;56(3):209-216. doi: 10.3414/ME16-01-0116. Epub 2017 Mar 31.
9
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.深度医学主题词表:用于改进大规模医学主题词表索引的深度语义表示。
Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.
10
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.医学主题词表(MeSH)标注器:通过整合多种证据提高大规模医学主题词表索引的准确性。
Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

引用本文的文献

1
Methodologically grounded semantic analysis of large volume of chilean medical literature data applied to the analysis of medical research funding efficiency in Chile.基于方法学的智利大量医学文献数据语义分析应用于智利医学研究经费效率分析。
J Biomed Semantics. 2020 Sep 29;11(1):12. doi: 10.1186/s13326-020-00226-w.
2
Towards the Inference of Social and Behavioral Determinants of Sexual Health: Development of a Gold-Standard Corpus with Semi-Supervised Learning.迈向性健康社会和行为决定因素的推断:利用半监督学习开发黄金标准语料库
AMIA Annu Symp Proc. 2018 Dec 5;2018:422-429. eCollection 2018.
3
Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces.

本文引用的文献

1
The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional.McNemar 检验用于二项匹配对数据:中 p 值和渐近法优于精确条件法。
BMC Med Res Methodol. 2013 Jul 13;13:91. doi: 10.1186/1471-2288-13-91.
2
Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.
用于结构化标签空间的少样本和零样本多标签学习
Proc Conf Empir Methods Nat Lang Process. 2018 Oct-Nov;2018:3132-3142.
4
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.医学主题词表现状:通过学习排序实现PubMed规模的自动医学主题词表索引编制。
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.