Suppr超能文献

使用基于混合方法的生物医学文献文档命名实体识别。

Named entity recognition on bio-medical literature documents using hybrid based approach.

作者信息

Ramachandran R, Arutchelvan K

机构信息

Department of Computer and Information Science, Annamalai University, Tamil Nadu, Chidambaram, India.

出版信息

J Ambient Intell Humaniz Comput. 2021 Mar 11:1-10. doi: 10.1007/s12652-021-03078-z.

Abstract

There have been many changes in the medical field due to technological advances. The progression in technologies provides lot of opportunities to extract valuable insights from huge amount of unstructured data. The literature documents published by the researchers in medical domain consists enormous amount of knowledge. Many organizations are involving in retrieving the hidden information from the literature documents. Extracting the drug names, diseases, symptoms, route of administration, species and dosage forms from the textual document is an easy task due to the innovation of technologies in the Natural Language Processing. In this article, a new hybrid based approach is proposed to identify named entity from the medical literature documents. New dictionary has been built for route of administration, dosage forms and symptoms to annotate the entities in the medical documents. The annotated entities are trained by the blank Spacy machine learning model. The trained model provide a decent accuracy when compared with the existing model. The hybrid model is validated with the dictionary and human (optional)to calculate the confusion matrix. It is able to identify more entities than the prevailing model. The average F1 score for five entities of the proposed hybrid based approach 73.79%.

摘要

由于技术进步,医学领域发生了许多变化。技术的进步为从大量非结构化数据中提取有价值的见解提供了很多机会。医学领域研究人员发表的文献记录包含了大量知识。许多组织都在致力于从文献记录中检索隐藏信息。由于自然语言处理技术的创新,从文本文件中提取药物名称、疾病、症状、给药途径、物种和剂型是一项容易的任务。在本文中,提出了一种新的基于混合的方法来从医学文献记录中识别命名实体。已经为给药途径、剂型和症状建立了新的词典,以注释医学文档中的实体。带注释的实体由空白的Spacy机器学习模型进行训练。与现有模型相比,训练后的模型具有相当不错的准确率。混合模型通过词典和人工(可选)进行验证,以计算混淆矩阵。它能够识别比现有模型更多的实体。所提出的基于混合方法的五个实体的平均F1分数为73.79%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a97/7947151/5bd10e88668d/12652_2021_3078_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验