• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于生物医学领域准确文档标注的最大熵方法。

A Maximum-Entropy approach for accurate document annotation in the biomedical domain.

作者信息

Tsatsaronis George, Macari Natalia, Torge Sunna, Dietze Heiko, Schroeder Michael

机构信息

Biotechnology Center (BIOTEC), Technische Universität Dresden, 01307 Dresden, Germany.

出版信息

J Biomed Semantics. 2012 Apr 24;3 Suppl 1(Suppl 1):S2. doi: 10.1186/2041-1480-3-S1-S2.

DOI:10.1186/2041-1480-3-S1-S2
PMID:22541593
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3337257/
Abstract

The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH).The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm's performance is resilient to terms' ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms.

摘要

网络上科学文献数量的不断增加以及缺乏用于对文档进行分类和搜索的有效工具,是影响搜索速度和结果质量的两个最重要因素。先前的研究表明,本体的使用使得在语义层面处理文档和查询信息成为可能,这极大地改善了相关信息的搜索,并朝着语义网迈进了一步。这些方法中的一个基本步骤是用本体概念对文档进行标注,这也可以看作是一个分类任务。在本文中,我们针对生物医学领域解决这个问题,并提出一种基于最大熵方法的新的自动化且稳健的方法,用于用医学主题词表(MeSH)中的术语对生物医学文献文档进行标注。实验评估表明,所建议的用MeSH术语标注生物医学文档的最大熵方法具有很高的准确性,对术语的模糊性具有鲁棒性,并且即使使用非常少量的训练文档也能提供非常好的性能。更确切地说,我们表明所提出的算法在所有探索的术语(4078个MeSH术语)范围内获得了92.4%的平均F值(精确率99.41%,召回率86.77%),并且该算法的性能对术语的模糊性具有弹性,在根据统一医学语言系统(UMLS)词库被发现为模糊的探索的MeSH术语中,平均F值为92.42%(精确率99.32%,召回率86.87%)。最后,我们将所建议方法的结果与朴素贝叶斯和决策树分类方法进行了比较,并且我们表明基于最大熵的方法在模糊和单义的MeSH术语中都以更高的F值表现。

相似文献

1
A Maximum-Entropy approach for accurate document annotation in the biomedical domain.一种用于生物医学领域准确文档标注的最大熵方法。
J Biomed Semantics. 2012 Apr 24;3 Suppl 1(Suppl 1):S2. doi: 10.1186/2041-1480-3-S1-S2.
2
Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study.通过对所有医学主题词描述符的四种检索策略进行自动性能评估来确定查询PubMed的最佳语义扩展:比较研究
JMIR Med Inform. 2020 Jun 4;8(6):e12799. doi: 10.2196/12799.
3
A knowledge-driven approach to biomedical document conceptualization.基于知识的生物医学文献概念化方法。
Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.
4
Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.
5
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.用于生物医学语义索引的搜索与图形数据库技术:实验分析
JMIR Med Inform. 2017 Dec 1;5(4):e48. doi: 10.2196/medinform.7059.
6
Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.基于链接生物医学本体的疾病-药物领域自动化本体生成框架。
Comput Methods Programs Biomed. 2018 Oct;165:117-128. doi: 10.1016/j.cmpb.2018.08.010. Epub 2018 Aug 16.
7
Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews.利用基于本体的语义相似性来促进系统评价的文献筛选过程。
J Biomed Inform. 2017 May;69:33-42. doi: 10.1016/j.jbi.2017.03.007. Epub 2017 Mar 14.
8
RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD:一款兼顾速度与准确性的生物医学语义注释工具。
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.
9
pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms.pyMeSHSim:一个用于生物医学命名实体识别、规范化和 MeSH 术语比较的集成 Python 包。
BMC Bioinformatics. 2020 Jun 18;21(1):252. doi: 10.1186/s12859-020-03583-6.
10
Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed.受医学主题词表(MeSH)和医学文献数据库(PubMed)启发,使用国际专利分类法(IPC)进行自动专利分类和引导式专利检索。
J Biomed Semantics. 2013 Apr 15;4 Suppl 1(Suppl 1):S3. doi: 10.1186/2041-1480-4-S1-S3.

引用本文的文献

1
Time evolution of the hierarchical networks between PubMed MeSH terms.PubMed MeSH 术语间层次网络的时间演化。
PLoS One. 2019 Aug 12;14(8):e0220648. doi: 10.1371/journal.pone.0220648. eCollection 2019.
2
EAPB: entropy-aware path-based metric for ontology quality.EAPB:用于本体质量的基于熵感知路径的度量标准。
J Biomed Semantics. 2018 Aug 10;9(1):20. doi: 10.1186/s13326-018-0188-7.
3
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。

本文引用的文献

1
MeSH Up: effective MeSH text classification for improved document retrieval.医学主题词表升级:用于改进文档检索的有效医学主题词表文本分类。
Bioinformatics. 2009 Jun 1;25(11):1412-8. doi: 10.1093/bioinformatics/btp249. Epub 2009 Apr 17.
2
GoPubMed: exploring PubMed with the Gene Ontology.GoPubMed:利用基因本体论探索PubMed
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W783-6. doi: 10.1093/nar/gki470.
3
Maximum entropy modeling for mining patient medication status from free text.基于最大熵模型从自由文本中挖掘患者用药状态
BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.
4
Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed.受医学主题词表(MeSH)和医学文献数据库(PubMed)启发,使用国际专利分类法(IPC)进行自动专利分类和引导式专利检索。
J Biomed Semantics. 2013 Apr 15;4 Suppl 1(Suppl 1):S3. doi: 10.1186/2041-1480-4-S1-S3.
5
Selected papers from the 14th Annual Bio-Ontologies Special Interest Group Meeting.第14届生物本体特别兴趣小组年度会议精选论文。
J Biomed Semantics. 2012 Apr 24;3 Suppl 1(Suppl 1):I1. doi: 10.1186/2041-1480-3-S1-I1.
Proc AMIA Symp. 2002:587-91.
4
Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature.利用生物医学文献的最大熵分析将基因与基因本体编码相关联。
Genome Res. 2002 Jan;12(1):203-14. doi: 10.1101/gr.199701.