• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

pubmed.mineR:一个带有文本挖掘算法的R包,用于分析PubMed摘要。

pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

作者信息

Rani Jyoti, Shah A B Rauf, Ramachandran Srinivasan

机构信息

GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, New Delhi 110 025, India.

出版信息

J Biosci. 2015 Oct;40(4):671-82. doi: 10.1007/s12038-015-9552-2.

DOI:10.1007/s12038-015-9552-2
PMID:26564970
Abstract

The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

摘要

PubMed文献数据库是科学研究的宝贵信息来源。它拥有丰富的生物医学文献,引用次数超过2400万次。对大量文献进行数据挖掘是一项具有挑战性的任务。尽管近年来已经开发了几种侧重于数据可视化的文本挖掘算法,但它们存在速度慢、过于僵化以及不开源等局限性。我们开发了一个R包pubmed.mineR,在其中我们结合了现有算法的优点,克服了它们的局限性,并为用户提供灵活性,以及与生物导体(Bioconductor)和综合R网络(CRAN)中的其他包建立链接,以扩展用户执行多方面方法的能力。本文展示了三个案例研究,即“糖尿病教育者角色的演变”、“癌症风险评估”和“疾病与共病的动态概念”,以说明pubmed.mineR的使用。即使处理大型语料库和计算密集型函数,该包在常规工作站上通常运行速度很快,耗时较短。pubmed.mineR可在http://cran.rproject.org/web/packages/pubmed.mineR获取。

相似文献

1
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.pubmed.mineR:一个带有文本挖掘算法的R包,用于分析PubMed摘要。
J Biosci. 2015 Oct;40(4):671-82. doi: 10.1007/s12038-015-9552-2.
2
BiocPkgTools: Toolkit for mining the package ecosystem.BiocPkgTools:用于挖掘软件包生态系统的工具包。
F1000Res. 2019 May 29;8:752. doi: 10.12688/f1000research.19410.1. eCollection 2019.
3
MeSHSim: An R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents.MeSHSim:一个用于测量医学主题词表(MeSH)标题和医学文献数据库(MEDLINE)文档语义相似度的R/Bioconductor软件包。
J Bioinform Comput Biol. 2015 Dec;13(6):1542002. doi: 10.1142/S0219720015420020. Epub 2015 Sep 9.
4
Automatic semantic classification of scientific literature according to the hallmarks of cancer.根据癌症特征对科学文献进行自动语义分类。
Bioinformatics. 2016 Feb 1;32(3):432-40. doi: 10.1093/bioinformatics/btv585. Epub 2015 Oct 9.
5
MPTM: A tool for mining protein post-translational modifications from literature.MPTM:一种从文献中挖掘蛋白质翻译后修饰的工具。
J Bioinform Comput Biol. 2017 Oct;15(5):1740005. doi: 10.1142/S0219720017400054. Epub 2017 Sep 11.
6
BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.生物医学搜索引擎框架:特定领域生物医学搜索引擎的轻量级定制实现。
Comput Methods Programs Biomed. 2016 Jul;131:63-77. doi: 10.1016/j.cmpb.2016.03.030. Epub 2016 Apr 8.
7
Towards PubMed 2.0.迈向 PubMed 2.0。
Elife. 2017 Oct 30;6:e28801. doi: 10.7554/eLife.28801.
8
CS-MINER: A Tool for Association Mining in Binding-Database.CS-MINER:一种用于结合数据库关联挖掘的工具。
Mol Inform. 2015 Apr;34(4):185-96. doi: 10.1002/minf.201400142. Epub 2015 Mar 10.
9
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.G-Bean:基于本体图的生物医学文献检索网络工具。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.
10
Twitter K-H networks in action: Advancing biomedical literature for drug search.Twitter K-H网络的实际应用:推动用于药物搜索的生物医学文献发展。
J Biomed Inform. 2015 Aug;56:157-68. doi: 10.1016/j.jbi.2015.05.015. Epub 2015 Jun 8.

引用本文的文献

1
Identification of Gene Expression Biomarkers Predictive of Latent Tuberculosis Infection Using Machine Learning Approaches.使用机器学习方法鉴定预测潜伏性结核感染的基因表达生物标志物
Genes (Basel). 2025 Jun 18;16(6):715. doi: 10.3390/genes16060715.
2
Text mining for case report articles on "peritoneal dialysis" from PubMed database.从PubMed数据库中挖掘关于“腹膜透析”的病例报告文章。
Ther Apher Dial. 2025 Jun;29(3):459-470. doi: 10.1111/1744-9987.70013. Epub 2025 Mar 26.
3
NF-κB signaling is the major inflammatory pathway for inducing insulin resistance.

本文引用的文献

1
Genenames.org: the HGNC resources in 2015.Genenames.org:2015年的HGNC资源。
Nucleic Acids Res. 2015 Jan;43(Database issue):D1079-85. doi: 10.1093/nar/gku1071. Epub 2014 Oct 31.
2
Activities at the Universal Protein Resource (UniProt).通用蛋白质资源库(UniProt)的活动。
Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.
3
Chapter 16: text mining for translational bioinformatics.第十六章:转化生物信息学中的文本挖掘。
核因子-κB信号传导是诱导胰岛素抵抗的主要炎症途径。
3 Biotech. 2025 Feb;15(2):47. doi: 10.1007/s13205-024-04202-4. Epub 2025 Jan 20.
4
Population Characteristics in Justice Health Research Based on PubMed Abstracts From 1963 to 2023: Text Mining Study.基于 1963 年至 2023 年 PubMed 摘要的司法健康研究中的人口特征:文本挖掘研究。
JMIR Form Res. 2024 Nov 22;8:e60878. doi: 10.2196/60878.
5
Investigating role of positively selected genes and mutation sites of ERG11 in drug resistance of Candida albicans.研究 ERG11 基因中阳性选择基因和突变位点在白念珠菌耐药性中的作用。
Arch Microbiol. 2024 Oct 18;206(11):437. doi: 10.1007/s00203-024-04159-1.
6
A concise guide to essential R packages for analyses of DNA, RNA, and proteins.用于DNA、RNA和蛋白质分析的必备R包简明指南。
Mol Cells. 2024 Nov;47(11):100120. doi: 10.1016/j.mocell.2024.100120. Epub 2024 Oct 5.
7
Global hotspots and trends of nutritional supplements in sport and exercise from 2000 to 2024: a bibliometric analysis.2000 年至 2024 年运动与锻炼营养补充剂的全球热点和趋势:文献计量分析。
J Health Popul Nutr. 2024 Sep 12;43(1):146. doi: 10.1186/s41043-024-00638-9.
8
Current research status and emerging trends in wheat: An integrated scientometric analysis based on ploidy uncovers hidden footprints in the scientific landscape.小麦的当前研究现状与新趋势:基于倍性的综合科学计量分析揭示科学领域中的隐藏足迹
Heliyon. 2024 Aug 15;10(16):e36375. doi: 10.1016/j.heliyon.2024.e36375. eCollection 2024 Aug 30.
9
SexAnnoDB, a knowledgebase of sex-specific regulations from multi-omics data of human cancers.性肿瘤数据库(SexAnnoDB),一个整合了人类癌症多组学数据中性别特异性调控信息的知识库。
Biol Sex Differ. 2024 Aug 22;15(1):64. doi: 10.1186/s13293-024-00638-8.
10
biotextgraph: graphical summarization of functional similarities from textual information.生物文本图:从文本信息中提取功能相似性的图形总结。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae357.
PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.
4
The evolving role of the diabetes educator.糖尿病教育者角色的演变。
Am J Med Sci. 2013 Apr;345(4):307-313. doi: 10.1097/MAJ.0b013e31828c68cc.
5
A travel guide to Cytoscape plugins. Cytoscape 插件使用指南。
Nat Methods. 2012 Nov;9(11):1069-76. doi: 10.1038/nmeth.2212. Epub 2012 Nov 6.
6
APCluster: an R package for affinity propagation clustering.APCluster:一个用于亲和传播聚类的 R 包。
Bioinformatics. 2011 Sep 1;27(17):2463-4. doi: 10.1093/bioinformatics/btr406. Epub 2011 Jul 6.
7
Entrez Gene: gene-centered information at NCBI.Entrez基因:美国国立医学图书馆国家生物技术信息中心的基因中心信息。
Nucleic Acids Res. 2011 Jan;39(Database issue):D52-7. doi: 10.1093/nar/gkq1237. Epub 2010 Nov 28.
8
The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature.癌症风险评估文本挖掘技术的发展的第一步:识别和组织风险评估文献中的科学证据。
BMC Bioinformatics. 2009 Sep 22;10:303. doi: 10.1186/1471-2105-10-303.
9
LitInspector: literature and signal transduction pathway mining in PubMed abstracts.LitInspector:在PubMed摘要中进行文献与信号转导通路挖掘
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W135-40. doi: 10.1093/nar/gkp303. Epub 2009 May 5.
10
PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites.PolySearch:一个基于网络的文本挖掘系统,用于提取人类疾病、基因、突变、药物和代谢物之间的关系。
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W399-405. doi: 10.1093/nar/gkn296. Epub 2008 May 16.