• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用维基百科知识对多语言生物医学文献进行分类。

Leveraging Wikipedia knowledge to classify multilingual biomedical documents.

机构信息

Department of Telematics Engineering, University of Vigo, Campus Lagoas-Marcosende, 36310 Vigo, Spain.

出版信息

Artif Intell Med. 2018 Jun;88:37-57. doi: 10.1016/j.artmed.2018.04.007. Epub 2018 May 3.

DOI:10.1016/j.artmed.2018.04.007
PMID:29730047
Abstract

This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical documents written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on Wikipedia interlanguage links to convert concept vectors between languages. The performance of the classifier is compared to a classifier based on machine translation, and two classifiers based on MetaMap. To perform the experiments, we created two multilingual corpus. The first one, Multi-Lingual UVigoMED (ML-UVigoMED) is composed of 23,647 Wikipedia documents about biomedical topics written in English, German, French, Spanish, Italian, Galician, Romanian, and Icelandic. The second one, English-French-Spanish-German UVigoMED (EFSG-UVigoMED) is composed of 19,210 biomedical abstract extracted from MEDLINE written in English, French, Spanish, and German. The performance of the approach proposed is superior to any of the state-of-the art classifier in the benchmark. We conclude that leveraging Wikipedia knowledge is of great advantage in tasks of multilingual classification of biomedical documents.

摘要

本文提出了一种分类器,利用维基百科知识将文档表示为概念权重的向量,并分析了当仅使用英语文档进行训练时,该分类器在对任何语言编写的生物医学文档进行分类时的适用性。我们提出了跨语言概念匹配技术,该技术依赖于维基百科的语言间链接在语言之间转换概念向量。将该分类器的性能与基于机器翻译的分类器和基于 MetaMap 的两个分类器进行了比较。为了进行实验,我们创建了两个多语言语料库。第一个是多语言 UVigoMED(ML-UVigoMED),它由 23647 篇关于生物医学主题的英文、德文、法文、西班牙文、意大利文、加利西亚文、罗马尼亚文和冰岛文的维基百科文档组成。第二个是英语-法语-西班牙语-德语 UVigoMED(EFSG-UVigoMED),它由从 MEDLINE 提取的 19210 篇生物医学摘要组成,这些摘要分别用英文、法文、西班牙文和德文撰写。所提出方法的性能优于基准测试中的任何一种最先进的分类器。我们得出结论,利用维基百科知识在生物医学文档的多语言分类任务中具有很大的优势。

相似文献

1
Leveraging Wikipedia knowledge to classify multilingual biomedical documents.利用维基百科知识对多语言生物医学文献进行分类。
Artif Intell Med. 2018 Jun;88:37-57. doi: 10.1016/j.artmed.2018.04.007. Epub 2018 May 3.
2
A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge*. Spanish-English Cross-language Case Study.一种使用维基百科知识进行生物医学文档分类的概念包方法*。西班牙语-英语跨语言案例研究。
Methods Inf Med. 2017 Oct 26;56(5):370-376. doi: 10.3414/ME17-01-0028. Epub 2017 Aug 16.
3
Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.利用百科知识进行生物医学文献分类:一种基于维基百科的概念袋方法。
PeerJ. 2015 Sep 29;3:e1279. doi: 10.7717/peerj.1279. eCollection 2015.
4
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.
5
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.
6
Large scale biomedical texts classification: a kNN and an ESA-based approaches.大规模生物医学文本分类:基于k近邻算法和基于词嵌入语义分析的方法。
J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.
7
Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。
J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.
8
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.SIFR 标注器:基于本体论的法语生物医学文本和临床笔记的语义标注。
BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.
9
Understanding Editing Behaviors in Multilingual Wikipedia.理解多语言维基百科中的编辑行为
PLoS One. 2016 May 12;11(5):e0155305. doi: 10.1371/journal.pone.0155305. eCollection 2016.
10
Automatically Expanding the Synonym Set of SNOMED CT using Wikipedia.利用维基百科自动扩展医学系统命名法临床术语(SNOMED CT)的同义词集
Stud Health Technol Inform. 2015;216:619-23.

引用本文的文献

1
Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.临床文档语料库——真实语料库、翻译语料库和合成替代语料库,以及各类领域替代语料库:语料库设计多样性调查,重点关注德语文本数据
JAMIA Open. 2025 May 14;8(3):ooaf024. doi: 10.1093/jamiaopen/ooaf024. eCollection 2025 Jun.
2
Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation.基于Transformer的语言模型用于生物医学文献中的群组随机试验分类:模型开发与验证
JMIR Med Inform. 2025 May 9;13:e63267. doi: 10.2196/63267.
3
AI-driven streamlined modeling: experiences and lessons learned from multiple domains.
人工智能驱动的简化建模:多领域的经验与教训
Softw Syst Model. 2022;21(3):1-23. doi: 10.1007/s10270-022-00982-6. Epub 2022 Feb 19.
4
Artificial intelligence-Developments in medicine in the last two years.人工智能——过去两年医学领域的发展
Chronic Dis Transl Med. 2019 Jan 9;5(1):64-68. doi: 10.1016/j.cdtm.2018.11.004. eCollection 2019 Mar.