Suppr超能文献

医学主题词表到矩阵:基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

作者信息

Turki Houcemeddine, Dossou Bonaventure F P, Emezue Chris Chinenye, Owodunni Abraham Toluwase, Hadj Taieb Mohamed Ali, Ben Aouicha Mohamed, Ben Hassen Hanen, Masmoudi Afif

机构信息

Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia.

Mila Quebec AI Institute, Montreal, Canada.

出版信息

J Biomed Semantics. 2024 Oct 2;15(1):18. doi: 10.1186/s13326-024-00319-w.

Abstract

Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .

摘要

通过将先进的机器学习技术应用于学术出版物的原始文本,生物医学关系分类有了显著改进。尽管有这种改进,但对大量原始文本的依赖使这些算法在泛化、精度和可靠性方面存在不足。利用书目元数据的独特特征对于完成这项具有挑战性的任务可能会有效提高性能。在本研究论文中,我们介绍了一种使用共同出现的医学主题词(MeSH)限定词进行生物医学关系分类的方法。首先,我们介绍了MeSH2Matrix,这是我们的数据集,由使用我们的方法从PubMed出版物中整理出的46469个生物医学关系组成。我们的数据集包括一个矩阵,该矩阵映射了主题MeSH关键词限定词与对象MeSH关键词限定词之间的关联。它还为每个关系指定了相应的维基数据关系类型和语义关系的超类。使用MeSH2Matrix,我们构建并训练了三个机器学习模型(支持向量机[SVM]、密集模型[D - 模型]和卷积神经网络[C - 网络])来评估我们的生物医学关系分类方法的效率。我们的最佳模型在195个类别上的准确率达到70.78%,在五个超类上的准确率达到83.09%。最后,我们提供混淆矩阵和广泛的特征分析,以更好地研究MeSH限定词与被分类的生物医学关系之间的关系。我们的结果有望为基于PubMed出版物的MeSH关键词开发更好的生物医学本体分类算法提供启示。为了便于重现,MeSH2Matrix以及我们所有的源代码可在https://github.com/SisonkeBiotik - Africa/MeSH2Matrix上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf9/11445994/54a0200fb8f0/13326_2024_319_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验