Suppr超能文献

Reactome通路分析:一种高性能的内存方法。

Reactome pathway analysis: a high-performance in-memory approach.

作者信息

Fabregat Antonio, Sidiropoulos Konstantinos, Viteri Guilherme, Forner Oscar, Marin-Garcia Pablo, Arnau Vicente, D'Eustachio Peter, Stein Lincoln, Hermjakob Henning

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

Open Targets, Wellcome Genome Campus, Hinxton, UK.

出版信息

BMC Bioinformatics. 2017 Mar 2;18(1):142. doi: 10.1186/s12859-017-1559-2.

Abstract

BACKGROUND

Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.

RESULTS

Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user's sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.

CONCLUSION

Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub ( https://github.com/reactome/ ).

摘要

背景

Reactome旨在提供生物信息学工具,用于通路知识的可视化、解释和分析,以支持基础研究、基因组分析、建模、系统生物学和教育。通路分析方法在生理学和生物医学研究中有广泛应用;从分析方法性能角度来看,主要问题之一是数据样本规模不断增大。

结果

在此,我们展示了一种成熟的超几何富集分析方法的新型高性能内存实现。为实现这一目标,超几何富集分析方法被分为四个不同步骤,并且针对每个步骤,使用特定的数据结构来提高性能并最小化内存占用。第一步,确定用户样本中的标识符是否与Reactome中的实体相对应,使用基数树作为查找表来解决。第二步,对蛋白质、化学物质、它们在其他物种中的直系同源物以及它们在复合物和集合中的组成进行建模,用图来解决。第三步和第四步,汇总结果并计算统计数据,用双向链表树解决。

结论

通过使用高度优化的内存数据结构和算法,Reactome实现了稳定、高性能的通路分析服务,能够在数秒内分析全基因组数据集,允许对高通量数据进行交互式探索和分析。所提出的通路分析方法可通过Reactome生产网站上的AnalysisService进行编程访问,或通过集成到PathwayBrowser中的用户提交界面获得。Reactome是一个开放数据和开源项目,其所有源代码,包括此处描述的代码,都可在Reactome GitHub的AnalysisTools仓库中获取(https://github.com/reactome/ )。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cdd8/5333408/c8b8ed3f44b7/12859_2017_1559_Fig1_HTML.jpg

相似文献

1
Reactome pathway analysis: a high-performance in-memory approach.
BMC Bioinformatics. 2017 Mar 2;18(1):142. doi: 10.1186/s12859-017-1559-2.
2
Reactome diagram viewer: data structures and strategies to boost performance.
Bioinformatics. 2018 Apr 1;34(7):1208-1214. doi: 10.1093/bioinformatics/btx752.
3
Interleukins and their signaling pathways in the Reactome biological pathway database.
J Allergy Clin Immunol. 2018 Apr;141(4):1411-1416. doi: 10.1016/j.jaci.2017.12.992. Epub 2018 Feb 21.
4
Reactome graph database: Efficient access to complex pathway data.
PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan.
5
Reactome Pengine: a web-logic API to the Homo sapiens reactome.
Bioinformatics. 2018 Aug 15;34(16):2856-2858. doi: 10.1093/bioinformatics/bty181.
6
PathRings: a web-based tool for exploration of ortholog and expression data in biological pathways.
BMC Bioinformatics. 2015 May 19;16(1):165. doi: 10.1186/s12859-015-0585-1.
7
STARGATE-X: a Python package for statistical analysis on the REACTOME network.
J Integr Bioinform. 2023 Sep 21;20(3). doi: 10.1515/jib-2022-0029. eCollection 2023 Sep 1.
8
Reactome knowledgebase of human biological pathways and processes.
Nucleic Acids Res. 2009 Jan;37(Database issue):D619-22. doi: 10.1093/nar/gkn863. Epub 2008 Nov 3.
9
Phylesystem: a git-based data store for community-curated phylogenetic estimates.
Bioinformatics. 2015 Sep 1;31(17):2794-800. doi: 10.1093/bioinformatics/btv276. Epub 2015 May 4.
10
Illuminate the Functions of Dark Proteins Using the Reactome-IDG Web Portal.
Curr Protoc. 2023 Jul;3(7):e845. doi: 10.1002/cpz1.845.

引用本文的文献

2
Genes underlying hereditary hearing impairment in humans and in mice.
MicroPubl Biol. 2025 Aug 8;2025. doi: 10.17912/micropub.biology.001728. eCollection 2025.
3
Modeling integration site data for safety assessment with MELISSA.
Nat Commun. 2025 Aug 23;16(1):7868. doi: 10.1038/s41467-025-63017-w.
6
Neutrophil TLR2 signaling promotes lipid accumulation and vascular plaque growth.
bioRxiv. 2025 Jul 14:2025.07.09.663961. doi: 10.1101/2025.07.09.663961.
7
Understanding the role of toggle genes in chronic lymphocytic leukemia proliferation.
NPJ Syst Biol Appl. 2025 Aug 11;11(1):91. doi: 10.1038/s41540-025-00575-1.
9
Gray-Horse Melanoma-A Wolf in Sheep's Clothing.
Int J Mol Sci. 2025 Jul 10;26(14):6620. doi: 10.3390/ijms26146620.

本文引用的文献

1
Pathway Analysis: State of the Art.
Front Physiol. 2015 Dec 17;6:383. doi: 10.3389/fphys.2015.00383. eCollection 2015.
2
PANTHER version 10: expanded protein families and functions, and analysis tools.
Nucleic Acids Res. 2016 Jan 4;44(D1):D336-42. doi: 10.1093/nar/gkv1194. Epub 2015 Nov 17.
3
High-throughput sequencing technologies.
Mol Cell. 2015 May 21;58(4):586-97. doi: 10.1016/j.molcel.2015.05.004.
4
Comparison of human cell signaling pathway databases--evolution, drawbacks and challenges.
Database (Oxford). 2015 Jan 28;2015. doi: 10.1093/database/bau126. Print 2015.
5
UniProt: a hub for protein information.
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.
6
ReactomeFIViz: a Cytoscape app for pathway and network-based data analysis.
F1000Res. 2014 Jul 1;3:146. doi: 10.12688/f1000research.4431.2. eCollection 2014.
7
The impact of next-generation sequencing on genomics.
J Genet Genomics. 2011 Mar 20;38(3):95-109. doi: 10.1016/j.jgg.2011.02.003. Epub 2011 Mar 15.
8
ConsensusPathDB: toward a more complete picture of cell biology.
Nucleic Acids Res. 2011 Jan;39(Database issue):D712-7. doi: 10.1093/nar/gkq1156. Epub 2010 Nov 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验