Suppr超能文献

LitMC-BERT:基于 Transformer 的生物医学文献多标签分类及其在 COVID-19 文献整理中的应用。

LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2584-2595. doi: 10.1109/TCBB.2022.3173562. Epub 2022 Oct 10.

Abstract

The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 200,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for ∼18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires ∼18% of the inference time than the Binary BERT baseline. The related datasets and models are available via https://github.com/ncbi/ml-transformer.

摘要

生物医学文献的快速增长给文献整理和解释带来了重大挑战。在 COVID-19 大流行期间,这一点变得更加明显。LitCovid 是 PubMed 中与 COVID-19 相关论文的文献数据库,已积累了超过 20 万篇文章,访问量达数百万次。每月约有 10000 篇新文章添加到 LitCovid 中。LitCovid 的主要整理任务之一是主题标注,即将一篇文章分配给多达 8 个主题,例如治疗和诊断。标注的主题在 LitCovid 中被广泛使用(例如,占总使用量的约 18%),并在下游研究中得到应用,例如网络生成。然而,由于任务的性质和文献的快速增长,这一直是主要的整理瓶颈。本研究提出了 LITMC-BERT,这是一种基于转换器的生物医学文献多标签分类方法。它为所有标签使用共享的转换器主干,同时还捕获标签特定的特征和标签对之间的相关性。我们在两个数据集上比较了 LITMC-BERT 与三个基线模型。其微 F1 和基于实例的 F1 分别比当前最佳结果高 5%和 4%,并且推理时间仅比二进制 BERT 基线长约 18%。相关数据集和模型可通过 https://github.com/ncbi/ml-transformer 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a43a/9647722/d717de028709/chen1-3173562.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验