• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TCGA调控网络的可重复性处理。

Reproducible processing of TCGA regulatory networks.

作者信息

Fanfani Viola, Shutta Katherine H, Mandros Panagiotis, Fischer Jonas, Saha Enakshi, Micheletti Soel, Chen Chen, Guebila Marouen Ben, Lopes-Ramos Camila M, Quackenbush John

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.

出版信息

bioRxiv. 2024 Nov 7:2024.11.05.622163. doi: 10.1101/2024.11.05.622163.

DOI:10.1101/2024.11.05.622163
PMID:39574772
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11580957/
Abstract

BACKGROUND

Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.

FINDINGS

We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed.

CONCLUSIONS

tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.

摘要

背景

测序和计算技术的进步使得对疾病分子基础的深入探索成为可能。生物网络已被证明是用于询问组学数据以及对调控基因和蛋白质相互作用进行建模的有用框架。诸如癌症基因组图谱(TCGA)之类的大型合作项目为构建和验证新的计算方法提供了丰富资源,从而产生了大量用于下载、预处理和分析这些数据的开源软件。然而,对于调控网络的端到端分析,一个连贯且可重复使用的工作流程对于将所有相关软件包集成到一个强大的管道中至关重要。

研究结果

我们开发了tcga-data-nf,这是一个Nextflow工作流程,它允许用户使用单个命令从TCGA的数千个样本中可重复地推断调控网络。该工作流程可分为三个主要步骤:下载、预处理多组学数据,如RNA测序和甲基化数据,最后使用netZoo软件工具推断调控网络模型。该工作流程由NetworkDataCompanion R软件包驱动,这是一个用于管理、映射和过滤TCGA数据的独立函数集合。在这里,我们展示了该管道如何用于研究可由表观遗传机制解释的结肠癌亚型之间的差异。最后,我们提供了10种最常见癌症类型的预生成网络,可随时访问。

结论

tcga-data-nf是一个完整但灵活且可扩展的框架,能够对癌症调控网络进行可重复的推断和分析,弥补了当前软件工具领域的一个空白。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ef8/11580957/025c875341db/nihpp-2024.11.05.622163v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ef8/11580957/c4b93b080a41/nihpp-2024.11.05.622163v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ef8/11580957/025c875341db/nihpp-2024.11.05.622163v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ef8/11580957/c4b93b080a41/nihpp-2024.11.05.622163v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ef8/11580957/025c875341db/nihpp-2024.11.05.622163v1-f0002.jpg

相似文献

1
Reproducible processing of TCGA regulatory networks.TCGA调控网络的可重复性处理。
bioRxiv. 2024 Nov 7:2024.11.05.622163. doi: 10.1101/2024.11.05.622163.
2
DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代:一个用于高通量基因组学的分布式数据处理平台。
BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.
3
DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics.DIscBIO:单细胞转录组学中生物标志物发现的用户友好型流程。
Int J Mol Sci. 2021 Jan 30;22(3):1399. doi: 10.3390/ijms22031399.
4
TCGA Expedition: A Data Acquisition and Management System for TCGA Data.TCGA探索计划:一个用于TCGA数据的数据采集与管理系统。
PLoS One. 2016 Oct 27;11(10):e0165395. doi: 10.1371/journal.pone.0165395. eCollection 2016.
5
Omics Pipe: a community-based framework for reproducible multi-omics data analysis.组学管道:一个基于社区的可重复多组学数据分析框架。
Bioinformatics. 2015 Jun 1;31(11):1724-8. doi: 10.1093/bioinformatics/btv061. Epub 2015 Jan 30.
6
Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite.从多因素表达数据推断和分析基因调控网络:一个完整的交互式套件。
BMC Genomics. 2021 May 26;22(1):387. doi: 10.1186/s12864-021-07659-2.
7
systemPipeR: NGS workflow and report generation environment.systemPipeR:二代测序工作流程与报告生成环境。
BMC Bioinformatics. 2016 Sep 20;17:388. doi: 10.1186/s12859-016-1241-0.
8
nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.nf-rnaSeqCount:一个用于从RNA测序数据中获取原始读取计数的Nextflow管道。
S Afr Comput J. 2021 Dec;33(2). doi: 10.18489/sacj.v33i2.830. Epub 2021 Dec 20.
9
Using empirical biological knowledge to infer regulatory networks from multi-omics data.利用经验生物学知识从多组学数据中推断调控网络。
BMC Bioinformatics. 2022 Aug 22;23(1):351. doi: 10.1186/s12859-022-04891-9.
10
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.

本文引用的文献

1
edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets.edgeR v4:具有扩展功能且对小计数和更大数据集提供更好支持的强大测序数据差异分析工具。
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkaf018.
2
Gene regulatory networks reveal sex difference in lung adenocarcinoma.基因调控网络揭示肺腺癌的性别差异。
Biol Sex Differ. 2024 Aug 6;15(1):62. doi: 10.1186/s13293-024-00634-y.
3
Clinical Challenges of Consensus Molecular Subtype CMS4 Colon Cancer in the Era of Precision Medicine.
精准医学时代共识分子亚型 CMS4 结肠癌的临床挑战。
Clin Cancer Res. 2024 Jun 3;30(11):2351-2358. doi: 10.1158/1078-0432.CCR-23-3964.
4
Quantifying the Expanding Landscape of Clinical Actionability for Patients with Cancer.量化癌症患者临床可操作性的扩展领域。
Cancer Discov. 2024 Jan 12;14(1):49-65. doi: 10.1158/2159-8290.CD-23-0467.
5
Heterogeneity in the gene regulatory landscape of leiomyosarcoma.平滑肌肉瘤基因调控格局的异质性。
NAR Cancer. 2023 Jul 24;5(3):zcad037. doi: 10.1093/narcan/zcad037. eCollection 2023 Sep.
6
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.
7
DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks.DRAGON:基于多组学网络的图形模型确定调控关系。
Nucleic Acids Res. 2023 Feb 22;51(3):e15. doi: 10.1093/nar/gkac1157.
8
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
9
Ensembl 2023.Ensembl 2023.
Nucleic Acids Res. 2023 Jan 6;51(D1):D933-D941. doi: 10.1093/nar/gkac958.
10
The co-evolution of the genome and epigenome in colorectal cancer.结直肠癌中基因组与表观基因组的共同进化。
Nature. 2022 Nov;611(7937):733-743. doi: 10.1038/s41586-022-05202-1. Epub 2022 Oct 26.