• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

设计一个通用的开放平台,用于在生物医学文献数据库PubMed中对文章进行机器学习辅助索引和聚类。

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

作者信息

Smalheiser Neil R, Cohen Aaron M

机构信息

Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, 1601 West Taylor Street, MC912, Chicago, IL 60612

Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA 97239.

出版信息

Data Inf Manag. 2018 Jun;2(1):27-36. doi: 10.2478/dim-2018-0004. Epub 2018 May 22.

DOI:10.2478/dim-2018-0004
PMID:30766970
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6372120/
Abstract

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

摘要

许多研究人员出于各种目的对生物医学文献进行文本挖掘,从索引词的分配到作者姓名的消歧。一种常见的方法是定义正例和负例训练样本,从文章元数据中提取特征,并使用机器学习算法。目前,每个研究小组都从零开始处理每个问题,并且与其他项目孤立开来,这导致了冗余和精力的极大浪费。在此,我们提出并描述了一个用于生物医学文本挖掘的通用平台的设计,该平台可以作为机器学习项目的共享资源,并可以作为其输出的公共存储库。我们最初将专注于一个特定目标,即根据出版类型对文章进行分类,并强调如何通过使用多种异构相似性度量作为机器学习模型的输入,使特征集更强大、更稳健。然后,我们讨论如何扩展通用平台,以包括各种各样基于机器学习的目标和项目,并且还可以用作向最终用户传播自然语言处理工具结果的公共平台。

相似文献

1
Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.设计一个通用的开放平台,用于在生物医学文献数据库PubMed中对文章进行机器学习辅助索引和聚类。
Data Inf Manag. 2018 Jun;2(1):27-36. doi: 10.2478/dim-2018-0004. Epub 2018 May 22.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
A new approach and gold standard toward author disambiguation in MEDLINE.一种新的方法和金标准,用于 MEDLINE 中的作者去重。
J Am Med Inform Assoc. 2019 Oct 1;26(10):1037-1045. doi: 10.1093/jamia/ocz028.
4
Anne O'Tate: Value-added PubMed search engine for analysis and text mining.安妮·奥泰特:用于分析和文本挖掘的增值 PubMed 搜索引擎。
PLoS One. 2021 Mar 8;16(3):e0248335. doi: 10.1371/journal.pone.0248335. eCollection 2021.
5
NetiNeti: discovery of scientific names from text using machine learning methods.内提内提:使用机器学习方法从文本中发现科学名称。
BMC Bioinformatics. 2012 Aug 22;13:211. doi: 10.1186/1471-2105-13-211.
6
Novel text analytics approach to identify relevant literature for human health risk assessments: A pilot study with health effects of in utero exposures.一种新颖的文本分析方法,用于识别与人类健康风险评估相关的文献:以宫内暴露的健康影响为研究对象的初步研究
Environ Int. 2020 Jan;134:105228. doi: 10.1016/j.envint.2019.105228. Epub 2019 Nov 8.
7
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
8
A novel biomedical image indexing and retrieval system via deep preference learning.一种基于深度偏好学习的新型生物医学图像索引和检索系统。
Comput Methods Programs Biomed. 2018 May;158:53-69. doi: 10.1016/j.cmpb.2018.02.003. Epub 2018 Feb 6.
9
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
10
Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.基于深度神经网络的临床相关生物医学文本摘要:模型开发与验证。
J Med Internet Res. 2020 Oct 23;22(10):e19810. doi: 10.2196/19810.

引用本文的文献

1
Publication Type Tagging using Transformer Models and Multi-Label Classification.使用Transformer模型和多标签分类的出版物类型标记
AMIA Annu Symp Proc. 2025 May 22;2024:818-827. eCollection 2024.
2
Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.利用全文特征增强生物医学文献中出版物类型和研究设计的自动索引。
medRxiv. 2025 Apr 28:2025.04.23.25326300. doi: 10.1101/2025.04.23.25326300.
3
Publication Type Tagging using Transformer Models and Multi-Label Classification.使用Transformer模型和多标签分类的出版物类型标注
medRxiv. 2025 Mar 7:2025.03.06.25323516. doi: 10.1101/2025.03.06.25323516.
4
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.

本文引用的文献

1
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.
2
PubRunner: A light-weight framework for updating text mining results.PubRunner:一个用于更新文本挖掘结果的轻量级框架。
F1000Res. 2017 May 2;6:612. doi: 10.12688/f1000research.11389.2. eCollection 2017.
3
Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.基于渐进采样的贝叶斯优化,用于高效自动的机器学习模型选择。
Health Inf Sci Syst. 2017 Sep 27;5(1):2. doi: 10.1007/s13755-017-0023-z. eCollection 2017 Dec.
4
Text mining resources for the life sciences.生命科学的文本挖掘资源。
Database (Oxford). 2016 Nov 25;2016. doi: 10.1093/database/baw145. Print 2016.
5
Topic detection using paragraph vectors to support active learning in systematic reviews.使用段落向量进行主题检测以支持系统评价中的主动学习
J Biomed Inform. 2016 Aug;62:59-65. doi: 10.1016/j.jbi.2016.06.001. Epub 2016 Jun 10.
6
Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.医学主题词表(MeSH)的两种相似性度量:助力生物医学文本挖掘与作者姓名消歧
J Biomed Discov Collab. 2016 Apr 6;7:e1. doi: 10.5210/disco.v7i0.6654.
7
Argo: enabling the development of bespoke workflows and services for disease annotation.阿尔戈:助力开发用于疾病注释的定制工作流程和服务。
Database (Oxford). 2016 May 17;2016. doi: 10.1093/database/baw066. Print 2016.
8
Learning statistical models of phenotypes using noisy labeled training data.使用带有噪声标签的训练数据学习表型的统计模型。
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.
9
RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.机器人评审员:用于自动评估临床试验偏倚的系统评估
J Am Med Inform Assoc. 2016 Jan;23(1):193-201. doi: 10.1093/jamia/ocv044. Epub 2015 Jun 22.
10
Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.随机对照试验文章的自动置信度分级分类:循证医学的辅助手段
J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.