Department of Computer Science, Columbia University New York City NY, USA.
School of Biomedical Informatics, The University of Texas Health Science Center at Houston Houston TX, USA.
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:276-285. eCollection 2021.
This paper describes an initial dataset and automatic natural language processing (NLP) method for extracting concepts related to precision oncology from biomedical research articles. We extract five concept types: Cancer, Mutation, Population, Treatment, Outcome. A corpus of 250 biomedical abstracts were annotated with these concepts following standard double-annotation procedures. We then experiment with BERT-based models for concept extraction. The best-performing model achieved a precision of 63.8%, a recall of 71.9%, and an F1 of 67.1. Finally, we propose additional directions for research for improving extraction performance and utilizing the NLP system in downstream precision oncology applications.
本文描述了一个初始数据集和一种自动自然语言处理 (NLP) 方法,用于从生物医学研究文章中提取与精准肿瘤学相关的概念。我们提取了五个概念类型:癌症、突变、人群、治疗、结果。遵循标准的双重注释程序,对 250 篇生物医学摘要进行了这些概念的注释。然后,我们尝试使用基于 BERT 的模型进行概念提取。表现最好的模型的精度为 63.8%,召回率为 71.9%,F1 值为 67.1%。最后,我们提出了进一步的研究方向,以提高提取性能,并在下游精准肿瘤学应用中利用 NLP 系统。