Suppr超能文献

多探针注意力神经网络用于 COVID-19 语义索引。

Multi-probe attention neural network for COVID-19 semantic indexing.

机构信息

Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China.

Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China.

出版信息

BMC Bioinformatics. 2022 Jun 29;23(1):259. doi: 10.1186/s12859-022-04803-x.

Abstract

BACKGROUND

The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain.

RESULTS

In this research, to investigate the semantic indexing problem for COVID-19, we first construct the new COVID-19 Semantic Indexing dataset, which consists of more than 80 thousand biomedical articles. We then propose a novel semantic indexing framework based on the multi-probe attention neural network (MPANN) to address the COVID-19 semantic indexing problem. Specifically, we employ a k-nearest neighbour based MeSH masking approach to generate candidate topic terms for each input article. We encode and feed the selected candidate terms as well as other contextual information as probes into the downstream attention-based neural network. Each semantic probe carries specific aspects of biomedical knowledge and provides informatively discriminative features for the input article. After extracting the semantic features at both term-level and document-level through the attention-based neural network, MPANN adopts a linear multi-view classifier to conduct the final topic prediction for COVID-19 semantic indexing.

CONCLUSION

The experimental results suggest that MPANN promises to represent the semantic features of biomedical texts and is effective in predicting semantic topics for COVID-19 related biomedical articles.

摘要

背景

COVID-19 大流行已大大加快了科学文献的出版速度。在当前危机下,如何有效地整理和索引大量的生物医学文献至关重要。以前的文献索引主要由人类专家使用医学主题词(MeSH)进行,这既费时又费力。因此,为了缓解昂贵的时间消耗和金钱成本,迫切需要针对新兴的 COVID-19 领域的自动语义索引技术。

结果

在这项研究中,为了研究 COVID-19 的语义索引问题,我们首先构建了新的 COVID-19 语义索引数据集,其中包含超过 8 万篇生物医学文章。然后,我们提出了一种基于多探针注意力神经网络(MPANN)的新颖语义索引框架来解决 COVID-19 的语义索引问题。具体来说,我们采用基于 k-最近邻的 MeSH 屏蔽方法为每个输入文章生成候选主题术语。我们将选择的候选术语以及其他上下文信息作为探针编码并输入到下游基于注意力的神经网络中。每个语义探针携带生物医学知识的特定方面,并为输入文章提供信息区分的特征。通过基于注意力的神经网络在术语级和文档级提取语义特征后,MPANN 采用线性多视图分类器对 COVID-19 语义索引进行最终主题预测。

结论

实验结果表明,MPANN 有望表示生物医学文本的语义特征,并有效地预测与 COVID-19 相关的生物医学文章的语义主题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/973b/9241329/c62723065fb9/12859_2022_4803_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验