Suppr超能文献

自然语言处理(NLP)工具在从研究文章中提取生物医学概念中的应用:以自闭症谱系障碍为例。

Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder.

机构信息

School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.

出版信息

BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322. doi: 10.1186/s12911-020-01352-2.

Abstract

BACKGROUND

Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations.

METHODS

We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score.

RESULTS

We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP.

CONCLUSION

The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations.

摘要

背景

自然语言处理(NLP)工具可以从非结构化的自由文本中提取生物医学概念,例如研究文章或临床记录。CLAMP、cTAKES 和 MetaMap 等 NLP 软件工具是提取生物医学概念实体最常用的工具之一。然而,它们在从文献中提取特定于疾病的术语方面的性能尚未得到广泛比较,特别是对于具有多种表型和临床表现的复杂神经精神疾病。

方法

我们以自闭症谱系障碍(ASD)为例,对这些 NLP 工具进行了比较评估。我们根据之前的文献收集了 827 个基于 ASD 的术语作为性能评估的基准列表。然后,我们将 CLAMP、cTAKES 和 MetaMap 应用于从 PubMed 收集的 544 篇全文文章和 20408 篇摘要,以提取 ASD 相关术语。我们使用精度、召回率和 F1 分数评估了预测性能。

结果

我们发现 CLAMP 在 F1 分数方面的性能最佳,其次是 cTAKES,然后是 MetaMap。我们的结果表明,CLAMP 的精度明显高于 cTAKES 和 MetaMap,而 cTAKES 和 MetaMap 的召回率则高于 CLAMP。

结论

本研究中使用的分析协议可应用于其他缺乏明确定义的术语集来描述其表型表现的神经精神或神经发育障碍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2234/7772897/30989ba46b3d/12911_2020_1352_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验