• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于Transformer的模型增强MEDLINE引文的自动PT标注

Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models.

作者信息

Cid Victor H, Mork James

机构信息

National Library of Medicine, Bethesda, Maryland, US.

出版信息

ArXiv. 2025 Jun 3:arXiv:2506.03321v1.

PMID:40735093
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12306818/
Abstract

We investigated the feasibility of predicting Medical Subject Headings (MeSH) Publication Types (PTs) from MEDLINE citation metadata using pre-trained Transformer-based models BERT and DistilBERT. This study addresses limitations in the current automated indexing process, which relies on legacy NLP algorithms. We evaluated monolithic multi-label classifiers and binary classifier ensembles to enhance the retrieval of biomedical literature. Results demonstrate the potential of Transformer models to significantly improve PT tagging accuracy, paving the way for scalable, efficient biomedical indexing.

摘要

我们研究了使用预训练的基于Transformer的模型BERT和DistilBERT从MEDLINE引文元数据预测医学主题词(MeSH)出版类型(PTs)的可行性。本研究解决了当前自动索引过程中依赖传统自然语言处理算法的局限性。我们评估了整体多标签分类器和二元分类器集成,以增强生物医学文献的检索。结果表明,Transformer模型有潜力显著提高PT标签的准确性,为可扩展、高效的生物医学索引铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/8826ac5da4bc/nihpp-2506.03321v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/7aef44a32fed/nihpp-2506.03321v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/adc98d31d2b2/nihpp-2506.03321v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/8826ac5da4bc/nihpp-2506.03321v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/7aef44a32fed/nihpp-2506.03321v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/adc98d31d2b2/nihpp-2506.03321v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c6b/12306818/8826ac5da4bc/nihpp-2506.03321v1-f0003.jpg

相似文献

1
Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models.使用基于Transformer的模型增强MEDLINE引文的自动PT标注
ArXiv. 2025 Jun 3:arXiv:2506.03321v1.
2
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
3
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset.在有限数据集上使用专家混合策略提高用于法语临床笔记分类的Transformer性能
IEEE J Transl Eng Health Med. 2025 Jun 4;13:261-274. doi: 10.1109/JTEHM.2025.3576570. eCollection 2025.
4
Predicting Drug-Side Effect Relationships From Parametric Knowledge Embedded in Biomedical BERT Models: Methodological Study With a Natural Language Processing Approach.从生物医学BERT模型中嵌入的参数知识预测药物副作用关系:一种自然语言处理方法的方法学研究
JMIR Med Inform. 2025 Jul 10;13:e67513. doi: 10.2196/67513.
5
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案:算法开发与验证
JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.
6
Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods.使用基于转换器的自然语言处理方法识别与糖尿病视网膜病变相关的临床概念及其属性。
BMC Med Inform Decis Mak. 2022 Sep 27;22(Suppl 3):255. doi: 10.1186/s12911-022-01996-2.
7
Identifying artificial intelligence-generated content using the DistilBERT transformer and NLP techniques.使用DistilBERT变换器和自然语言处理技术识别由人工智能生成的内容。
Sci Rep. 2025 Jul 1;15(1):20366. doi: 10.1038/s41598-025-08208-7.
8
Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.在MEDLINE和EMBASE中识别诊断准确性研究的检索策略。
Cochrane Database Syst Rev. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3.
9
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
10
Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。
Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.

本文引用的文献

1
Manual versus machine: How accurately does the Medical Text Indexer (MTI) classify different document types into disease areas?手动与机器:Medical Text Indexer(MTI)将不同文档类型分类到疾病领域的准确度如何?
PLoS One. 2024 Mar 13;19(3):e0297526. doi: 10.1371/journal.pone.0297526. eCollection 2024.
2
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例:通过表情符号增强带有情感词汇的深度学习模型。
J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.
3
The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey.
从生物医学文献的手动语义索引到自动语义索引之路:十年历程。
Front Res Metr Anal. 2023 Sep 29;8:1250930. doi: 10.3389/frma.2023.1250930. eCollection 2023.
4
A scoping review of preprocessing methods for unstructured text data to assess data quality.对非结构化文本数据进行预处理以评估数据质量的范围回顾。
Int J Popul Data Sci. 2022 Oct 4;7(1):1757. doi: 10.23889/ijpds.v6i1.1757. eCollection 2022.
5
Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study.使用 NLM 的医学文本索引器 (MTI) 进行自动索引与 Medline 中的人工索引相比:一项试点研究。
J Med Libr Assoc. 2023 Jul 10;111(3):684-694. doi: 10.5195/jmla.2023.1588.
6
Bat4RCT: A suite of benchmark data and baseline methods for text classification of randomized controlled trials.Bat4RCT:一组用于随机对照试验文本分类的基准数据和基线方法。
PLoS One. 2023 Mar 24;18(3):e0283342. doi: 10.1371/journal.pone.0283342. eCollection 2023.
7
Natural language processing analysis applied to COVID-19 open-text opinions using a distilBERT model for sentiment categorization.使用蒸馏BERT模型对COVID-19开放文本观点进行情感分类的自然语言处理分析。
AI Soc. 2022 Nov 21:1-8. doi: 10.1007/s00146-022-01594-w.
8
Towards Transfer Learning Techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study.面向迁移学习技术——BERT、DistilBERT、BERTimbau 和 DistilBERTimbau 用于来自不同语言的自动文本分类:案例研究。
Sensors (Basel). 2022 Oct 26;22(21):8184. doi: 10.3390/s22218184.
9
Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification.与健康相关的社交媒体文本分类的预训练模型和策略比较。
Healthcare (Basel). 2022 Aug 5;10(8):1478. doi: 10.3390/healthcare10081478.
10
Testing a filtering strategy for systematic reviews: evaluating work savings and recall.测试系统评价的过滤策略:评估工作节省和召回率。
AMIA Jt Summits Transl Sci Proc. 2022 May 23;2022:406-413. eCollection 2022.