• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Rummagene:从生物医学研究出版物的支持材料中大规模挖掘基因集。

Rummagene: massive mining of gene sets from supporting materials of biomedical research publications.

机构信息

Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.

出版信息

Commun Biol. 2024 Apr 20;7(1):482. doi: 10.1038/s42003-024-06177-7.

DOI:10.1038/s42003-024-06177-7
PMID:38643247
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11032387/
Abstract

Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. So far, we scanned 5,448,589 articles to find 121,237 articles that contain 642,389 gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at https://rummagene.com .

摘要

许多生物医学研究出版物在其支持的表格中包含基因集,而这些集目前无法进行搜索和重用。通过爬取 PubMed Central,Rummagene 服务器提供了对数十万个此类哺乳动物基因集的访问。到目前为止,我们已经扫描了 5448589 篇文章,找到了包含 642389 个基因集的 121237 篇文章。这些集可用于富集分析、自由文本和表格标题搜索。通过研究 Rummagene 数据库中的统计模式,我们证明 Rummagene 可用于转录因子和激酶富集分析以及基因功能预测。通过将基因集相似度与摘要相似度相结合,Rummagene 可以发现生物过程、概念和命名实体之间令人惊讶的关系。总的来说,Rummagene 使得能够搜索大量目前被埋没和无法访问的已发表生物医学数据集。Rummagene 的网络应用程序可在 https://rummagene.com 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/1be46372e680/42003_2024_6177_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/4ae9f79c796c/42003_2024_6177_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/3f899c16eaa8/42003_2024_6177_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/c9ac3e38322d/42003_2024_6177_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/3ce1b9ac1a6e/42003_2024_6177_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/078d894c5084/42003_2024_6177_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/12aaf7d7d6a1/42003_2024_6177_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/1be46372e680/42003_2024_6177_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/4ae9f79c796c/42003_2024_6177_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/3f899c16eaa8/42003_2024_6177_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/c9ac3e38322d/42003_2024_6177_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/3ce1b9ac1a6e/42003_2024_6177_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/078d894c5084/42003_2024_6177_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/12aaf7d7d6a1/42003_2024_6177_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00e1/11032387/1be46372e680/42003_2024_6177_Fig7_HTML.jpg

相似文献

1
Rummagene: massive mining of gene sets from supporting materials of biomedical research publications.Rummagene:从生物医学研究出版物的支持材料中大规模挖掘基因集。
Commun Biol. 2024 Apr 20;7(1):482. doi: 10.1038/s42003-024-06177-7.
2
BioReader: a text mining tool for performing classification of biomedical literature.BioReader:一种文本挖掘工具,用于对生物医学文献进行分类。
BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.
3
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications.基于云的生物医学出版物中用户定义短语类别关联的短语挖掘与分析
J Vis Exp. 2019 Feb 23(144). doi: 10.3791/59108.
4
RummaGEO: Automatic Mining of Human and Mouse Gene Sets from GEO.RummaGEO:从基因表达综合数据库自动挖掘人类和小鼠基因集
bioRxiv. 2024 Apr 13:2024.04.09.588712. doi: 10.1101/2024.04.09.588712.
5
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
6
MPTM: A tool for mining protein post-translational modifications from literature.MPTM:一种从文献中挖掘蛋白质翻译后修饰的工具。
J Bioinform Comput Biol. 2017 Oct;15(5):1740005. doi: 10.1142/S0219720017400054. Epub 2017 Sep 11.
7
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
8
QTLTableMiner: semantic mining of QTL tables in scientific articles.QTLTableMiner:科学文章中QTL表格的语义挖掘
BMC Bioinformatics. 2018 May 25;19(1):183. doi: 10.1186/s12859-018-2165-7.
9
Biomedical named entity recognition and linking datasets: survey and our recent development.生物医学命名实体识别与链接数据集:综述及我们的最新进展
Brief Bioinform. 2020 Dec 1;21(6):2219-2238. doi: 10.1093/bib/bbaa054.
10
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.

引用本文的文献

1
MicrobiomeKG: bridging microbiome research and host health through knowledge graphs.微生物群落知识图谱(MicrobiomeKG):通过知识图谱连接微生物群落研究与宿主健康。
Front Syst Biol. 2025 Aug 29;5:1544432. doi: 10.3389/fsysb.2025.1544432. eCollection 2025.
2
Knowledge-guided Contextual Gene Set Analysis Using Large Language Models.使用大语言模型进行知识引导的上下文基因集分析
ArXiv. 2025 Jun 4:arXiv:2506.04303v1.
3
A Gene Set Foundation Model Pre-Trained on a Massive Collection of Diverse Gene Sets.基于大量多样基因集集合预训练的基因集基础模型。

本文引用的文献

1
Using published pathway figures in enrichment analysis and machine learning.在富集分析和机器学习中使用已发表的通路图。
BMC Genomics. 2023 Nov 25;24(1):713. doi: 10.1186/s12864-023-09816-1.
2
Metabolic plasticity in blast crisis-chronic myeloid leukaemia cells under hypoxia reduces the cytotoxic potency of drugs targeting mitochondria.缺氧条件下急变期慢性髓性白血病细胞的代谢可塑性降低了靶向线粒体药物的细胞毒性效力。
Discov Oncol. 2022 Jul 8;13(1):60. doi: 10.1007/s12672-022-00524-y.
3
The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.
bioRxiv. 2025 Jun 2:2025.05.30.657124. doi: 10.1101/2025.05.30.657124.
4
pubCounteR: an R package for interrogating published literature for experimentally-derived gene lists within a user-defined biological context.pubCounteR:一个R软件包,用于在用户定义的生物学背景下,针对通过实验得出的基因列表查询已发表文献。
Front Bioinform. 2025 May 6;5:1523184. doi: 10.3389/fbinf.2025.1523184. eCollection 2025.
5
L2S2: chemical perturbation and CRISPR KO LINCS L1000 signature search engine.L2S2:化学扰动与CRISPR基因敲除的LINCS L1000特征搜索引擎
Nucleic Acids Res. 2025 Jul 7;53(W1):W338-W350. doi: 10.1093/nar/gkaf373.
6
GeneSetCart: assembling, augmenting, combining, visualizing, and analyzing gene sets.基因集购物车:组装、扩充、合并、可视化和分析基因集。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf025.
7
Gene module-trait network analysis uncovers cell type specific systems and genes relevant to Alzheimer's Disease.基因模块-性状网络分析揭示了与阿尔茨海默病相关的细胞类型特异性系统和基因。
bioRxiv. 2025 Feb 1:2025.01.31.635970. doi: 10.1101/2025.01.31.635970.
8
Chromosomal and gonadal sex have differing effects on social motivation in mice.染色体性别和性腺性别对小鼠的社会动机有不同影响。
Biol Sex Differ. 2025 Feb 19;16(1):13. doi: 10.1186/s13293-025-00690-y.
9
C-terminal amides mark proteins for degradation via SCF-FBXO31.C末端酰胺化标记蛋白质以便通过SCF-FBXO31进行降解。
Nature. 2025 Feb;638(8050):519-527. doi: 10.1038/s41586-024-08475-w. Epub 2025 Jan 29.
10
Potential Adaptive Introgression From Dogs in Iberian Grey Wolves (Canis lupus).伊比利亚灰狼(Canis lupus)中可能存在来自狗的适应性基因渗入。
Mol Ecol. 2025 Jun;34(12):e17639. doi: 10.1111/mec.17639. Epub 2025 Jan 10.
智慧人图谱:人类多器官单细胞转录组图谱。
Science. 2022 May 13;376(6594):eabl4896. doi: 10.1126/science.abl4896.
4
SigCom LINCS: data and metadata search engine for a million gene expression signatures.SigCom LINCS:用于百万个基因表达特征的数据集和元数据搜索引擎。
Nucleic Acids Res. 2022 Jul 5;50(W1):W697-W709. doi: 10.1093/nar/gkac328.
5
Inward Outward Signaling in Ovarian Cancer: Morpho-Phospho-Proteomic Profiling Upon Application of Hypoxia and Shear Stress Characterizes the Adaptive Plasticity of OVCAR-3 and SKOV-3 Cells.卵巢癌中的内向-外向信号传导:缺氧和剪切应力作用下的形态-磷酸化蛋白质组学分析揭示了OVCAR-3和SKOV-3细胞的适应性可塑性
Front Oncol. 2022 Feb 14;11:746411. doi: 10.3389/fonc.2021.746411. eCollection 2021.
6
The interactome of CLUH reveals its association to SPAG5 and its co-translational proximity to mitochondrial proteins.CLUH 的相互作用组揭示了其与 SPAG5 的关联及其与线粒体蛋白的共翻译接近度。
BMC Biol. 2022 Jan 10;20(1):13. doi: 10.1186/s12915-021-01213-y.
7
recount3: summaries and queries for large-scale RNA-seq expression and splicing.recount3:大规模 RNA-seq 表达和剪接的摘要和查询。
Genome Biol. 2021 Nov 29;22(1):323. doi: 10.1186/s13059-021-02533-6.
8
ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments.ReMap 2022:一个整合了 DNA 结合测序实验分析的人类、小鼠、果蝇和拟南芥调控区域数据库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D316-D325. doi: 10.1093/nar/gkab996.
9
KEA3: improved kinase enrichment analysis via data integration.KEA3:通过数据集成改进激酶富集分析。
Nucleic Acids Res. 2021 Jul 2;49(W1):W304-W316. doi: 10.1093/nar/gkab359.
10
Feasibility of Phosphoproteomics on Leftover Samples After RNA Extraction With Guanidinium Thiocyanate.胍盐法提取 RNA 后剩余样本进行磷酸化蛋白质组学分析的可行性
Mol Cell Proteomics. 2021;20:100078. doi: 10.1016/j.mcpro.2021.100078. Epub 2021 Apr 2.