医学主题词表到矩阵：基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

作者信息

Turki Houcemeddine, Dossou Bonaventure F P, Emezue Chris Chinenye, Owodunni Abraham Toluwase, Hadj Taieb Mohamed Ali, Ben Aouicha Mohamed, Ben Hassen Hanen, Masmoudi Afif

机构信息

Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia.

Mila Quebec AI Institute, Montreal, Canada.

出版信息

J Biomed Semantics. 2024 Oct 2;15(1):18. doi: 10.1186/s13326-024-00319-w.

DOI:10.1186/s13326-024-00319-w

PMID:39354632

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11445994/

Abstract

Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .

摘要

通过将先进的机器学习技术应用于学术出版物的原始文本，生物医学关系分类有了显著改进。尽管有这种改进，但对大量原始文本的依赖使这些算法在泛化、精度和可靠性方面存在不足。利用书目元数据的独特特征对于完成这项具有挑战性的任务可能会有效提高性能。在本研究论文中，我们介绍了一种使用共同出现的医学主题词（MeSH）限定词进行生物医学关系分类的方法。首先，我们介绍了MeSH2Matrix，这是我们的数据集，由使用我们的方法从PubMed出版物中整理出的46469个生物医学关系组成。我们的数据集包括一个矩阵，该矩阵映射了主题MeSH关键词限定词与对象MeSH关键词限定词之间的关联。它还为每个关系指定了相应的维基数据关系类型和语义关系的超类。使用MeSH2Matrix，我们构建并训练了三个机器学习模型（支持向量机[SVM]、密集模型[D - 模型]和卷积神经网络[C - 网络]）来评估我们的生物医学关系分类方法的效率。我们的最佳模型在195个类别上的准确率达到70.78%，在五个超类上的准确率达到83.09%。最后，我们提供混淆矩阵和广泛的特征分析，以更好地研究MeSH限定词与被分类的生物医学关系之间的关系。我们的结果有望为基于PubMed出版物的MeSH关键词开发更好的生物医学本体分类算法提供启示。为了便于重现，MeSH2Matrix以及我们所有的源代码可在https://github.com/SisonkeBiotik - Africa/MeSH2Matrix上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf9/11445994/54a0200fb8f0/13326_2024_319_Fig1_HTML.jpg

相似文献

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.医学主题词表到矩阵：基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类

J Biomed Semantics. 2024 Oct 2;15(1):18. doi: 10.1186/s13326-024-00319-w.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Fabricating mice and dementia: opening up relations in multi-species research制造小鼠与痴呆症：开启多物种研究中的关联

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Short-Term Memory Impairment短期记忆障碍

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

引用本文的文献

A framework for integrating biomedical knowledge in Wikidata with open biological and biomedical ontologies and MeSH keywords.一种将维基数据中的生物医学知识与开放生物和生物医学本体以及医学主题词表关键词相整合的框架。

Heliyon. 2024 Sep 27;10(19):e38448. doi: 10.1016/j.heliyon.2024.e38448. eCollection 2024 Oct 15.

本文引用的文献

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.基于分布和关系上下文的增强词表示法进行生物医学文本分类

Comput Intell Neurosci. 2023 Feb 15;2023:2989791. doi: 10.1155/2023/2989791. eCollection 2023.

Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource.调查生物医学关系抽取：对当前数据集的批判性考察及新资源的提出。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae132.

Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature.用于生物医学文献细粒度语义索引的弱监督深度学习大规模研究。

J Biomed Inform. 2023 Oct;146:104499. doi: 10.1016/j.jbi.2023.104499. Epub 2023 Sep 14.

Associating biological context with protein-protein interactions through text mining at PubMed scale.通过在 PubMed 规模上进行文本挖掘，将生物背景与蛋白质-蛋白质相互作用联系起来。

J Biomed Inform. 2023 Sep;145:104474. doi: 10.1016/j.jbi.2023.104474. Epub 2023 Aug 10.

Explainable AI for clinical and remote health applications: a survey on tabular and time series data.用于临床和远程健康应用的可解释人工智能：关于表格数据和时间序列数据的综述

Artif Intell Rev. 2023;56(6):5261-5315. doi: 10.1007/s10462-022-10304-3. Epub 2022 Oct 26.

Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata.使用逻辑约束来验证协作知识图谱中有关疾病爆发的统计信息：以维基数据中的COVID-19流行病学为例。

PeerJ Comput Sci. 2022 Sep 29;8:e1085. doi: 10.7717/peerj-cs.1085. eCollection 2022.

BertSRC: transformer-based semantic relation classification.BertSRC：基于转换器的语义关系分类。

BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.

Mapping the biomedical sciences using Medical Subject Headings: a comparison between MeSH co-assignments and MeSH citation pairs.运用医学主题词对生物医学科学进行映射：MeSH 共同分配与 MeSH 引文对之间的比较。

J Med Libr Assoc. 2021 Jul 1;109(3):441-449. doi: 10.5195/jmla.2021.1173.

Enhancing Knowledge Graph Extraction and Validation From Scholarly Publications Using Bibliographic Metadata.利用书目元数据增强学术出版物中的知识图谱提取与验证

Front Res Metr Anal. 2021 May 28;6:694307. doi: 10.3389/frma.2021.694307. eCollection 2021.

Artificial Intelligence in Medicine: Chances and Challenges for Wide Clinical Adoption.医学中的人工智能：广泛临床应用的机遇与挑战

Visc Med. 2020 Dec;36(6):443-449. doi: 10.1159/000511930. Epub 2020 Oct 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医学主题词表到矩阵：基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献