• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

orsum:一个使用简单原理过滤和比较富集分析的 Python 包。

orsum: a Python package for filtering and comparing enrichment analyses using a simple principle.

机构信息

Aix Marseille University, Inserm, MMG, Marseille, France.

Barcelona Supercomputing Center (BSC), Barcelona, Spain.

出版信息

BMC Bioinformatics. 2022 Jul 23;23(1):293. doi: 10.1186/s12859-022-04828-2.

DOI:10.1186/s12859-022-04828-2
PMID:35870894
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9308244/
Abstract

BACKGROUND

Enrichment analyses are widely applied to investigate lists of genes of interest. However, such analyses often result in long lists of annotation terms with high redundancy, making the interpretation and reporting difficult. Long annotation lists and redundancy also complicate the comparison of results obtained from different enrichment analyses. An approach to overcome these issues is using down-sized annotation collections composed of non-redundant terms. However, down-sized collections are generic and the level of detail may not fit the user's study. Other available approaches include clustering and filtering tools, which are based on similarity measures and thresholds that can be complicated to comprehend and set.

RESULT

We propose orsum, a Python package to filter enrichment results. orsum can filter multiple enrichment results collectively and highlight common and specific annotation terms. Filtering in orsum is based on a simple principle: a term is discarded if there is a more significant term that annotates at least the same genes; the remaining more significant term becomes the representative term for the discarded term. This principle ensures that the main biological information is preserved in the filtered results while reducing redundancy. In addition, as the representative terms are selected from the original enrichment results, orsum outputs filtered terms tailored to the study. As a use case, we applied orsum to the enrichment analyses of four lists of genes, each associated with a neurodegenerative disease.

CONCLUSION

orsum provides a comprehensible and effective way of filtering and comparing enrichment results. It is available at https://anaconda.org/bioconda/orsum .

摘要

背景

富集分析被广泛应用于研究感兴趣的基因列表。然而,此类分析通常会产生具有高度冗余性的大量注释术语列表,使得解释和报告变得困难。长注释列表和冗余性也使不同富集分析结果的比较变得复杂。克服这些问题的一种方法是使用由非冗余术语组成的缩小注释集。然而,缩小的集合是通用的,细节水平可能不符合用户的研究需求。其他可用的方法包括聚类和过滤工具,它们基于相似性度量和阈值,这些可能难以理解和设置。

结果

我们提出了 orsum,这是一个用于过滤富集结果的 Python 包。orsum 可以集体过滤多个富集结果,并突出显示常见和特定的注释术语。orsum 中的过滤基于一个简单的原则:如果有一个更具显著性的术语至少注释了相同的基因,则会丢弃该术语;剩余的更具显著性的术语成为被丢弃术语的代表术语。该原则确保在过滤结果中保留主要的生物学信息,同时减少冗余性。此外,由于代表术语是从原始富集结果中选择的,orsum 会输出针对研究定制的过滤术语。作为一个用例,我们将 orsum 应用于四个与神经退行性疾病相关的基因列表的富集分析。

结论

orsum 提供了一种易于理解和有效的过滤和比较富集结果的方法。它可以在 https://anaconda.org/bioconda/orsum 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/48be54b36661/12859_2022_4828_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/6aa137803c1d/12859_2022_4828_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/21636e349203/12859_2022_4828_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/7b8e7acbea69/12859_2022_4828_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/48be54b36661/12859_2022_4828_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/6aa137803c1d/12859_2022_4828_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/21636e349203/12859_2022_4828_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/7b8e7acbea69/12859_2022_4828_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5e/9308244/48be54b36661/12859_2022_4828_Fig4_HTML.jpg

相似文献

1
orsum: a Python package for filtering and comparing enrichment analyses using a simple principle.orsum:一个使用简单原理过滤和比较富集分析的 Python 包。
BMC Bioinformatics. 2022 Jul 23;23(1):293. doi: 10.1186/s12859-022-04828-2.
2
Gogadget: An R Package for Interpretation and Visualization of GO Enrichment Results.Gogadget:用于解释和可视化 GO 富集结果的 R 包。
Mol Inform. 2017 May;36(5-6). doi: 10.1002/minf.201600132. Epub 2016 Dec 21.
3
simplifyEnrichment: A Bioconductor Package for Clustering and Visualizing Functional Enrichment Results.simplifyEnrichment:一个用于聚类和可视化功能富集结果的 Bioconductor 包。
Genomics Proteomics Bioinformatics. 2023 Feb;21(1):190-202. doi: 10.1016/j.gpb.2022.04.008. Epub 2022 Jun 6.
4
Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data.比较基因注释富集工具在农业微阵列数据分析中的功能建模。
BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-10-S11-S9.
5
GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions.GOMCL:一个用于聚类、评估和提取基于基因本体论的功能的非冗余关联的工具包。
BMC Bioinformatics. 2020 Apr 10;21(1):139. doi: 10.1186/s12859-020-3447-4.
6
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.GOrilla:一种用于在排序后的基因列表中发现和可视化富集的基因本体(GO)术语的工具。
BMC Bioinformatics. 2009 Feb 3;10:48. doi: 10.1186/1471-2105-10-48.
7
Summary Visualizations of Gene Ontology Terms With GO-Figure!使用GO-Figure对基因本体术语进行总结可视化!
Front Bioinform. 2021 Apr 1;1:638255. doi: 10.3389/fbinf.2021.638255. eCollection 2021.
8
PyIOmica: longitudinal omics analysis and trend identification.PyIOmica:纵向组学分析和趋势识别。
Bioinformatics. 2020 Apr 1;36(7):2306-2307. doi: 10.1093/bioinformatics/btz896.
9
Enrichment map: a network-based method for gene-set enrichment visualization and interpretation.富集图谱:一种基于网络的基因集富集可视化和解释方法。
PLoS One. 2010 Nov 15;5(11):e13984. doi: 10.1371/journal.pone.0013984.
10
DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.DAGO-Fun:一种基于基因本体论的功能分析工具,使用术语信息内容度量。
BMC Bioinformatics. 2013 Sep 25;14:284. doi: 10.1186/1471-2105-14-284.

引用本文的文献

1
MOTL: enhancing multi-omics matrix factorization with transfer learning.MOTL:通过迁移学习增强多组学矩阵分解
Genome Biol. 2025 Jul 25;26(1):224. doi: 10.1186/s13059-025-03675-7.
2
The molecular impact of cigarette smoking resembles aging across tissues.吸烟对分子层面的影响类似于各组织的衰老过程。
Genome Med. 2025 Jun 2;17(1):66. doi: 10.1186/s13073-025-01485-x.
3
A collaborative network analysis for the interpretation of transcriptomics data in Huntington's disease.用于解读亨廷顿舞蹈病转录组学数据的协作网络分析

本文引用的文献

1
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.clusterProfiler 4.0:用于解释组学数据的通用富集工具。
Innovation (Camb). 2021 Jul 1;2(3):100141. doi: 10.1016/j.xinn.2021.100141. eCollection 2021 Aug 28.
2
The reactome pathway knowledgebase.Reactome 通路知识库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D498-D503. doi: 10.1093/nar/gkz1031.
3
The DisGeNET knowledge platform for disease genomics: 2019 update.DisGeNET 疾病基因组学知识平台:2019 年更新。
Sci Rep. 2025 Jan 9;15(1):1412. doi: 10.1038/s41598-025-85580-4.
4
Integrative analysis of multi-omics data reveals importance of collagen and the PI3K AKT signalling pathway in CAKUT.多组学数据的综合分析揭示了胶原和 PI3K AKT 信号通路在 CAKUT 中的重要性。
Sci Rep. 2024 Sep 5;14(1):20731. doi: 10.1038/s41598-024-71721-8.
5
System-level analysis of genes mutated in muscular dystrophies reveals a functional pattern associated with muscle weakness distribution.肌肉疾病中突变基因的系统水平分析揭示了与肌肉无力分布相关的功能模式。
Sci Rep. 2024 May 16;14(1):11225. doi: 10.1038/s41598-024-60761-9.
Nucleic Acids Res. 2020 Jan 8;48(D1):D845-D855. doi: 10.1093/nar/gkz1021.
4
pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks.pathfindR:一个通过活性子网全面识别组学数据中富集通路的R包。
Front Genet. 2019 Sep 25;10:858. doi: 10.3389/fgene.2019.00858. eCollection 2019.
5
ViSEAGO: a Bioconductor package for clustering biological functions using Gene Ontology and semantic similarity.ViSEAGO:一个用于使用基因本体论和语义相似性对生物学功能进行聚类的Bioconductor软件包。
BioData Min. 2019 Aug 6;12:16. doi: 10.1186/s13040-019-0204-1. eCollection 2019.
6
g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).g:Profiler:一个用于功能富集分析和基因列表转换的网络服务器(2019 更新)。
Nucleic Acids Res. 2019 Jul 2;47(W1):W191-W198. doi: 10.1093/nar/gkz369.
7
FunMappOne: a tool to hierarchically organize and visually navigate functional gene annotations in multiple experiments.FunMappOne:一种用于在多个实验中对功能基因注释进行层次组织和可视化导航的工具。
BMC Bioinformatics. 2019 Feb 15;20(1):79. doi: 10.1186/s12859-019-2639-2.
8
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
9
Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data.Cytoscape StringApp:蛋白质组学数据的网络分析和可视化。
J Proteome Res. 2019 Feb 1;18(2):623-632. doi: 10.1021/acs.jproteome.8b00702. Epub 2018 Dec 5.
10
The Gene Ontology Resource: 20 years and still GOing strong.《基因本体论资源:20 年,持续强大》
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. doi: 10.1093/nar/gky1055.