• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于 COVID-19 的文章的元数据是否足以完成多标签主题分类任务?

Is metadata of articles about COVID-19 enough for multilabel topic classification task?

机构信息

College of Economics and Management, Beijing University of Technology, No. 100 PingLeYuan, Chaoyang District, Beijing 100124, P.R. China.

Institute of Scientific and Technical Information of China, No. 15 Fuxing Road, Haidian District, Beijing 100038, P.R. China.

出版信息

Database (Oxford). 2024 Oct 21;2024. doi: 10.1093/database/baae106.

DOI:10.1093/database/baae106
PMID:39432499
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11492800/
Abstract

The ever-increasing volume of COVID-19-related articles presents a significant challenge for the manual curation and multilabel topic classification of LitCovid. For this purpose, a novel multilabel topic classification framework is developed in this study, which considers both the correlation and imbalance of topic labels, while empowering the pretrained model. With the help of this framework, this study devotes to answering the following question: Do full texts, MeSH (Medical Subject Heading), and biological entities of articles about COVID-19 encode more discriminative information than metadata (title, abstract, keyword, and journal name)? From extensive experiments on our enriched version of the BC7-LitCovid corpus and Hallmarks of Cancer corpus, the following conclusions can be drawn. Our framework demonstrates superior performance and robustness. The metadata of scientific publications about COVID-19 carries valuable information for multilabel topic classification. Compared to biological entities, full texts and MeSH can further enhance the performance of our framework for multilabel topic classification, but the improved performance is very limited. Database URL: https://github.com/pzczxs/Enriched-BC7-LitCovid.

摘要

不断增加的 COVID-19 相关文献数量给 LitCovid 的人工策展和多标签主题分类带来了重大挑战。为此,本研究提出了一种新的多标签主题分类框架,该框架考虑了主题标签的相关性和不平衡性,同时增强了预训练模型的能力。借助该框架,本研究致力于回答以下问题:关于 COVID-19 的文章的全文、MeSH(医学主题词)和生物实体是否比元数据(标题、摘要、关键词和期刊名称)编码更多的鉴别信息?通过对我们丰富的 BC7-LitCovid 语料库和癌症特征 Hallmarks 语料库的广泛实验,可以得出以下结论。我们的框架表现出优越的性能和鲁棒性。COVID-19 相关科学出版物的元数据对多标签主题分类具有有价值的信息。与生物实体相比,全文和 MeSH 可以进一步提高我们的多标签主题分类框架的性能,但改进的性能非常有限。数据库 URL:https://github.com/pzczxs/Enriched-BC7-LitCovid。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/db65d8728366/baae106fa3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/8abc8b019f7b/baae106f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/34cd403ed1fd/baae106f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/07449f1daacd/baae106f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/51efbd186b13/baae106f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/c3c8e44be5e9/baae106f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/cd0f72f29a5c/baae106f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/24a6f5d12d1e/baae106f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/3217da548aff/baae106f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/796e030f5f71/baae106f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/cf36f44d30e8/baae106f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/3463a833f0b0/baae106f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/438dffe6fb73/baae106f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/df8b2e8095fc/baae106fa1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/fb4b26f9527f/baae106fa2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/db65d8728366/baae106fa3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/8abc8b019f7b/baae106f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/34cd403ed1fd/baae106f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/07449f1daacd/baae106f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/51efbd186b13/baae106f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/c3c8e44be5e9/baae106f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/cd0f72f29a5c/baae106f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/24a6f5d12d1e/baae106f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/3217da548aff/baae106f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/796e030f5f71/baae106f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/cf36f44d30e8/baae106f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/3463a833f0b0/baae106f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/438dffe6fb73/baae106f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/df8b2e8095fc/baae106fa1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/fb4b26f9527f/baae106fa2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd5a/11492800/db65d8728366/baae106fa3.jpg

相似文献

1
Is metadata of articles about COVID-19 enough for multilabel topic classification task?关于 COVID-19 的文章的元数据是否足以完成多标签主题分类任务?
Database (Oxford). 2024 Oct 21;2024. doi: 10.1093/database/baae106.
2
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.
3
Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis.SARS-CoV-2测序研究中报告的患者相关元数据:范围综述和文献计量分析方案
JMIR Res Protoc. 2025 Apr 22;14:e58567. doi: 10.2196/58567.
4
LitCovid: an open database of COVID-19 literature.LitCovid:一个 COVID-19 文献的开放数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1534-D1540. doi: 10.1093/nar/gkaa952.
5
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.生物医学文献的多标签分类:BioCreative VII LitCovid 新冠文献主题标注挑战赛概述。
Database (Oxford). 2022 Aug 31;2022. doi: 10.1093/database/baac069.
6
GEOMetaCuration: a web-based application for accurate manual curation of Gene Expression Omnibus metadata.GEOMetaCuration:一个基于网络的应用程序,用于准确地手动整理基因表达综合数据集元数据。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay019.
7
Automated annotation of scientific texts for ML-based keyphrase extraction and validation.用于基于机器学习的关键短语提取与验证的科学文本自动标注
Database (Oxford). 2024 Sep 27;2024. doi: 10.1093/database/baae093.
8
LitCovid ensemble learning for COVID-19 multi-label classification.LitCovid 用于 COVID-19 多标签分类的集成学习。
Database (Oxford). 2022 Nov 25;2022. doi: 10.1093/database/baac103.
9
LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation.LitMC-BERT:基于 Transformer 的生物医学文献多标签分类及其在 COVID-19 文献整理中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2584-2595. doi: 10.1109/TCBB.2022.3173562. Epub 2022 Oct 10.
10
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.基于 BERT 的集成学习方法在 BioCreative VII 挑战赛中的应用:PubMed 文章中的全文化学物质识别和多标签分类。
Database (Oxford). 2022 Jul 15;2022. doi: 10.1093/database/baac056.

本文引用的文献

1
LitCovid ensemble learning for COVID-19 multi-label classification.LitCovid 用于 COVID-19 多标签分类的集成学习。
Database (Oxford). 2022 Nov 25;2022. doi: 10.1093/database/baac103.
2
LitCovid in 2022: an information resource for the COVID-19 literature.2022 年的 LitCovid:COVID-19 文献信息资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1512-D1518. doi: 10.1093/nar/gkac1005.
3
An active learning-based approach for screening scholarly articles about the origins of SARS-CoV-2.基于主动学习的方法筛选有关 SARS-CoV-2 起源的学术文章。
PLoS One. 2022 Sep 16;17(9):e0273725. doi: 10.1371/journal.pone.0273725. eCollection 2022.
4
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.生物医学文献的多标签分类:BioCreative VII LitCovid 新冠文献主题标注挑战赛概述。
Database (Oxford). 2022 Aug 31;2022. doi: 10.1093/database/baac069.
5
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.基于 BERT 的集成学习方法在 BioCreative VII 挑战赛中的应用:PubMed 文章中的全文化学物质识别和多标签分类。
Database (Oxford). 2022 Jul 15;2022. doi: 10.1093/database/baac056.
6
LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation.LitMC-BERT:基于 Transformer 的生物医学文献多标签分类及其在 COVID-19 文献整理中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2584-2595. doi: 10.1109/TCBB.2022.3173562. Epub 2022 Oct 10.
7
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.HunFlair:一种用于最先进生物医学命名实体识别的易于使用的工具。
Bioinformatics. 2021 Sep 9;37(17):2792-2794. doi: 10.1093/bioinformatics/btab042.
8
LitCovid: an open database of COVID-19 literature.LitCovid:一个 COVID-19 文献的开放数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1534-D1540. doi: 10.1093/nar/gkaa952.
9
pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms.pyMeSHSim:一个用于生物医学命名实体识别、规范化和 MeSH 术语比较的集成 Python 包。
BMC Bioinformatics. 2020 Jun 18;21(1):252. doi: 10.1186/s12859-020-03583-6.
10
Keep up with the latest coronavirus research.跟上冠状病毒的最新研究进展。
Nature. 2020 Mar;579(7798):193. doi: 10.1038/d41586-020-00694-1.