• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

simplifyEnrichment:一个用于聚类和可视化功能富集结果的 Bioconductor 包。

simplifyEnrichment: A Bioconductor Package for Clustering and Visualizing Functional Enrichment Results.

机构信息

Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg, D-69120 Heidelberg, Germany.

Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg, D-69120 Heidelberg, Germany; Heidelberg Institute of Stem Cell Technology and Experimental Medicine (HI-STEM), D-69120 Heidelberg, Germany; German Cancer Consortium (DKTK), D-69120 Heidelberg, Germany; Department of Pediatric Immunology, Hematology and Oncology, University Hospital Heidelberg, D-69120 Heidelberg, Germany.

出版信息

Genomics Proteomics Bioinformatics. 2023 Feb;21(1):190-202. doi: 10.1016/j.gpb.2022.04.008. Epub 2022 Jun 6.

DOI:10.1016/j.gpb.2022.04.008
PMID:35680096
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10373083/
Abstract

Functional enrichment analysis or gene set enrichment analysis is a basic bioinformatics method that evaluates the biological importance of a list of genes of interest. However, it may produce a long list of significant terms with highly redundant information that is difficult to summarize. Current tools to simplify enrichment results by clustering them into groups either still produce redundancy between clusters or do not retain consistent term similarities within clusters. We propose a new method named binary cut for clustering similarity matrices of functional terms. Through comprehensive benchmarks on both simulated and real-world datasets, we demonstrated that binary cut could efficiently cluster functional terms into groups where terms showed consistent similarities within groups and were mutually exclusive between groups. We compared binary cut clustering on the similarity matrices obtained from different similarity measures and found that semantic similarity worked well with binary cut, while similarity matrices based on gene overlap showed less consistent patterns. We implemented the binary cut algorithm in the R package simplifyEnrichment, which additionally provides functionalities for visualizing, summarizing, and comparing the clustering. The simplifyEnrichment package and the documentation are available at https://bioconductor.org/packages/simplifyEnrichment/.

摘要

功能富集分析或基因集富集分析是一种基本的生物信息学方法,用于评估一组感兴趣基因的生物学重要性。然而,它可能会产生大量具有高度冗余信息的显著术语,难以进行总结。目前,通过聚类将富集结果简化的工具要么仍然在聚类之间产生冗余,要么不在聚类内部保留一致的术语相似性。我们提出了一种名为二进制切割的新方法,用于聚类功能术语的相似性矩阵。通过对模拟和真实数据集的全面基准测试,我们证明了二进制切割可以有效地将功能术语聚类成组,其中术语在组内显示出一致的相似性,并且在组之间是相互排斥的。我们比较了基于不同相似性度量的相似性矩阵上的二进制切割聚类,发现语义相似性与二进制切割配合良好,而基于基因重叠的相似性矩阵显示出较少的一致模式。我们在 R 包 simplifyEnrichment 中实现了二进制切割算法,该包还提供了可视化、总结和比较聚类的功能。simplifyEnrichment 包和文档可在 https://bioconductor.org/packages/simplifyEnrichment/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/dbcf7b45aac6/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/b2f0609552a1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/9cad8e812c8a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/46e6db29abf4/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/39cd8de49270/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/78cb7e93396b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/0b6bf2ddd3a1/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/dbcf7b45aac6/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/b2f0609552a1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/9cad8e812c8a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/46e6db29abf4/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/39cd8de49270/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/78cb7e93396b/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/0b6bf2ddd3a1/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de43/10373083/dbcf7b45aac6/gr7.jpg

相似文献

1
simplifyEnrichment: A Bioconductor Package for Clustering and Visualizing Functional Enrichment Results.simplifyEnrichment:一个用于聚类和可视化功能富集结果的 Bioconductor 包。
Genomics Proteomics Bioinformatics. 2023 Feb;21(1):190-202. doi: 10.1016/j.gpb.2022.04.008. Epub 2022 Jun 6.
2
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.DOSE:一个用于疾病本体语义和富集分析的R/Bioconductor软件包。
Bioinformatics. 2015 Feb 15;31(4):608-9. doi: 10.1093/bioinformatics/btu684. Epub 2014 Oct 17.
3
Gene Ontology Semantic Similarity Analysis Using GOSemSim.使用 GOSemSim 进行基因本体论语义相似性分析。
Methods Mol Biol. 2020;2117:207-215. doi: 10.1007/978-1-0716-0301-7_11.
4
GOSemSim: an R package for measuring semantic similarity among GO terms and gene products.GO 语义相似度分析:用于测量 GO 术语和基因产物之间语义相似性的 R 包。
Bioinformatics. 2010 Apr 1;26(7):976-8. doi: 10.1093/bioinformatics/btq064. Epub 2010 Feb 23.
5
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
6
Using meshes for MeSH term enrichment and semantic analyses.使用网格进行 MeSH 术语富集和语义分析。
Bioinformatics. 2018 Nov 1;34(21):3766-3767. doi: 10.1093/bioinformatics/bty410.
7
mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics.mAPKL:用于检测基因范例并揭示其特征的R/Bioconductor软件包。
BMC Bioinformatics. 2015 Sep 15;16(1):291. doi: 10.1186/s12859-015-0719-5.
8
orsum: a Python package for filtering and comparing enrichment analyses using a simple principle.orsum:一个使用简单原理过滤和比较富集分析的 Python 包。
BMC Bioinformatics. 2022 Jul 23;23(1):293. doi: 10.1186/s12859-022-04828-2.
9
ViSEAGO: a Bioconductor package for clustering biological functions using Gene Ontology and semantic similarity.ViSEAGO:一个用于使用基因本体论和语义相似性对生物学功能进行聚类的Bioconductor软件包。
BioData Min. 2019 Aug 6;12:16. doi: 10.1186/s13040-019-0204-1. eCollection 2019.
10
A new method to measure the semantic similarity of GO terms.一种测量基因本体术语语义相似性的新方法。
Bioinformatics. 2007 May 15;23(10):1274-81. doi: 10.1093/bioinformatics/btm087. Epub 2007 Mar 7.

引用本文的文献

1
Selective targeting of TBXT with DARPins identifies regulatory networks and therapeutic vulnerabilities in chordoma.用抗肌动蛋白重复结构域蛋白选择性靶向TBXT可确定脊索瘤中的调控网络和治疗弱点。
Sci Adv. 2025 Sep 5;11(36):eadu2796. doi: 10.1126/sciadv.adu2796. Epub 2025 Sep 3.
2
Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读
BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.
3
GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis.基因集聚类2.0:一个用于总结和整合基因集分析的综合工具集。

本文引用的文献

1
GSAn: an alternative to enrichment analysis for annotating gene sets.GSAn:一种用于注释基因集的富集分析替代方法。
NAR Genom Bioinform. 2020 Mar 14;2(2):lqaa017. doi: 10.1093/nargab/lqaa017. eCollection 2020 Jun.
2
cola: an R/Bioconductor package for consensus partitioning through a general framework.cola:一个通过通用框架进行共识分割的 R/Bioconductor 包。
Nucleic Acids Res. 2021 Feb 22;49(3):e15. doi: 10.1093/nar/gkaa1146.
3
GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions.
BMC Bioinformatics. 2025 Aug 21;26(1):219. doi: 10.1186/s12859-025-06249-3.
4
Deep representation learning of electrocardiogram reveals biological insights in cardiac phenotypes and cardiovascular diseases.心电图的深度表征学习揭示了心脏表型和心血管疾病中的生物学见解。
iScience. 2025 Jul 28;28(8):113226. doi: 10.1016/j.isci.2025.113226. eCollection 2025 Aug 15.
5
FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life.FANTASIA利用语言模型来解码整个动物生命树中的功能性暗蛋白质组。
Commun Biol. 2025 Aug 14;8(1):1227. doi: 10.1038/s42003-025-08651-2.
6
Key immune regulators in retinal ischemia-reperfusion injury RNA sequencing.视网膜缺血再灌注损伤中的关键免疫调节因子 RNA 测序
Int J Ophthalmol. 2025 Jul 18;18(7):1237-1251. doi: 10.18240/ijo.2025.07.06. eCollection 2025.
7
LIN-39 is a neuron-specific developmental determinant of longevity in Caenorhabditis elegans with reduced insulin signaling.LIN-39是秀丽隐杆线虫中一种神经元特异性的发育决定因子,在胰岛素信号传导减弱时可决定寿命。
Nat Commun. 2025 Jul 16;16(1):6566. doi: 10.1038/s41467-025-61786-y.
8
Fecal proteomics of wild capuchins reveals impacts of season, diet, age, and, sex on gut physiology.野生卷尾猴的粪便蛋白质组学揭示了季节、饮食、年龄和性别对肠道生理的影响。
bioRxiv. 2025 Jun 21:2025.06.16.659980. doi: 10.1101/2025.06.16.659980.
9
Identification of Gene Expression Biomarkers Predictive of Latent Tuberculosis Infection Using Machine Learning Approaches.使用机器学习方法鉴定预测潜伏性结核感染的基因表达生物标志物
Genes (Basel). 2025 Jun 18;16(6):715. doi: 10.3390/genes16060715.
10
OXA1L deficiency causes mitochondrial myopathy via reactive oxygen species regulated nuclear factor kappa B signalling pathway.OXA1L缺乏通过活性氧调节的核因子κB信号通路导致线粒体肌病。
Clin Transl Med. 2025 Jun;15(6):e70385. doi: 10.1002/ctm2.70385.
GOMCL:一个用于聚类、评估和提取基于基因本体论的功能的非冗余关联的工具包。
BMC Bioinformatics. 2020 Apr 10;21(1):139. doi: 10.1186/s12859-020-3447-4.
4
Genome-Wide Association Studies for Cerebrospinal Fluid Soluble TREM2 in Alzheimer's Disease.阿尔茨海默病中脑脊液可溶性触发受体表达分子2的全基因组关联研究
Front Aging Neurosci. 2019 Oct 25;11:297. doi: 10.3389/fnagi.2019.00297. eCollection 2019.
5
WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs.WebGestalt 2019:基因集分析工具包,具有全新的用户界面和 API。
Nucleic Acids Res. 2019 Jul 2;47(W1):W199-W205. doi: 10.1093/nar/gkz401.
6
The Gene Ontology Resource: 20 years and still GOing strong.《基因本体论资源:20 年,持续强大》
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. doi: 10.1093/nar/gky1055.
7
GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms.GO-GO:一种改进的基因本体术语间语义相似度测量算法。
Sci Rep. 2018 Oct 10;8(1):15107. doi: 10.1038/s41598-018-33219-y.
8
KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书(KEGG):关于基因组、通路、疾病和药物的新视角。
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.
9
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
10
Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery.基因本体语义相似性工具:生物知识发现的特征与挑战综述
Brief Bioinform. 2017 Sep 1;18(5):886-901. doi: 10.1093/bib/bbw067.