Suppr超能文献

pyMeSHSim:一个用于生物医学命名实体识别、规范化和 MeSH 术语比较的集成 Python 包。

pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms.

机构信息

Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, 430070, PR China.

College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, Hubei, 430070, PR China.

出版信息

BMC Bioinformatics. 2020 Jun 18;21(1):252. doi: 10.1186/s12859-020-03583-6.

Abstract

BACKGROUND

Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis.

RESULTS

The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall > 0.94, precision > 0.56, and F1 > 0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89-0.99.

CONCLUSIONS

The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.

摘要

背景

通过不同的方法已经鉴定出许多致病基因,但这些基因的疾病表型的生物医学命名实体(bio-NE)还没有统一的注释。此外,两个 bio-NE 注释之间的语义相似性比较对于数据集成或系统遗传学分析变得非常重要。

结果

pyMeSHSim 包通过使用 MetaMap 识别 bio-NE,MetaMap 在自然语言处理中生成统一医学语言系统 (UMLS) 概念。为了将 UMLS 概念映射到医学主题词 (MeSH),pyMeSHSim 嵌入了一个自制的数据集,其中包含主要标题 (MH)、补充概念记录 (SCR) 及其在 MeSH 中的关系。基于该数据集,pyMeSHSim 实现了基于四个信息内容 (IC) 的算法和一个基于图的算法,以衡量两个 MeSH 术语之间的语义相似性。为了评估其性能,我们使用 pyMeSHSim 解析 OMIM 和 GWAS 表型。pyMeSHSim 引入了 SCR 和非 MeSH 同义 UMLS 概念的策管策略,这提高了 pyMeSHSim 在识别 OMIM 表型方面的性能。在对 461 个 GWAS 表型的策管中,pyMeSHSim 显示召回率 > 0.94、精度 > 0.56 和 F1 > 0.70,在识别短生物医学短语中的 MeSH 术语方面的性能优于最新的工具 DNorm 和 TaggerOne。通过 pyMeSHSim 和另一个语义分析工具 meshes 分别计算从 pyMeSHSim 识别的 MeSH 术语和之前的手动工作中计算出的 MeSH 术语的语义相似性。结果表明,两个工具分析的语义相似性相关性高达 0.89-0.99。

结论

集成了 MeSH MH 和 SCR 的综合 MeSH 工具 pyMeSHSim 实现了生物医学文本挖掘中的 bio-NE 识别、标准化和比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a3b/7301509/3a097eb2a048/12859_2020_3583_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验