SparkINFERNO：一种可扩展的高通量管道，用于推断非编码遗传变异的分子机制。

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

机构信息

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Genomics and Computational Biology Graduate Group.

出版信息

Bioinformatics. 2020 Jun 1;36(12):3879-3881. doi: 10.1093/bioinformatics/btaa246.

DOI:10.1093/bioinformatics/btaa246

PMID:32330239

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7320617/

Abstract

SUMMARY

We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.

AVAILABILITY AND IMPLEMENTATION

SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.

CONTACT

lswang@pennmedicine.upenn.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

我们报告了基于 Spark 的非编码遗传变异分子机制推断（SparkINFERNO），这是一个可扩展的生物信息学管道，用于描述全基因组关联研究（GWAS）关联发现中的非编码基因组。SparkINFERNO 优先考虑 GWAS 关联信号背后的因果变异，并报告相关的调控元件、组织背景以及它们影响的可能靶基因。为了实现这一目标，SparkINFERNO 算法将 GWAS 汇总统计信息与功能基因组学数据集的大规模集合集成在一起，这些数据集涵盖了增强子活性、转录因子结合、表达数量性状基因座和其他功能数据集，跨越 400 多种组织和细胞类型。通过使用 Apache Spark 和基于 Giggle 的基因组索引实现的底层 API 实现了可扩展性。我们在大型 GWAS 上评估了 SparkINFERNO，并表明 SparkINFERNO 的效率比其他方法提高了 60 多倍，并且可以根据数据大小和计算资源量进行扩展。

可用性和实现

SparkINFERNO 在具有 Apache Spark 环境的集群或单个服务器上运行，可在 https://bitbucket.org/wanglab-upenn/SparkINFERNO 或 https://hub.docker.com/r/wanglab/spark-inferno 上获得。

联系方式

lswang@pennmedicine.upenn.edu。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7301/7320617/a91499ab9dca/btaa246f1.jpg

相似文献

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.SparkINFERNO：一种可扩展的高通量管道，用于推断非编码遗传变异的分子机制。

Bioinformatics. 2020 Jun 1;36(12):3879-3881. doi: 10.1093/bioinformatics/btaa246.

hipFG: high-throughput harmonization and integration pipeline for functional genomics data.hipFG：高通量功能基因组学数据协调和整合管道。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad673.

INFERNO: inferring the molecular mechanisms of noncoding genetic variants.INFERNO：推断非编码遗传变异的分子机制。

Nucleic Acids Res. 2018 Sep 28;46(17):8740-8753. doi: 10.1093/nar/gky686.

Integration of methylation QTL and enhancer-target gene maps with schizophrenia GWAS summary results identifies novel genes.甲基化 QTL 和增强子-靶基因图谱与精神分裂症 GWAS 汇总结果的整合确定了新的基因。

Bioinformatics. 2019 Oct 1;35(19):3576-3583. doi: 10.1093/bioinformatics/btz161.

GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach.格雷戈尔：使用系统的数据驱动方法评估表观基因组特征中与性状相关变体的全局富集情况。

Bioinformatics. 2015 Aug 15;31(16):2601-6. doi: 10.1093/bioinformatics/btv201. Epub 2015 Apr 16.

Quantifying functional impact of non-coding variants with multi-task Bayesian neural network.使用多任务贝叶斯神经网络量化非编码变异的功能影响。

Bioinformatics. 2020 Mar 1;36(5):1397-1404. doi: 10.1093/bioinformatics/btz767.

echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline.回声定位器：一个自动化的端到端统计和功能基因组精细映射管道。

Bioinformatics. 2022 Jan 3;38(2):536-539. doi: 10.1093/bioinformatics/btab658.

GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits.GWAS4D：人类复杂疾病和特征的上下文特定调控变体的多维分析。

Nucleic Acids Res. 2018 Jul 2;46(W1):W114-W120. doi: 10.1093/nar/gky407.

Inferring the Molecular Mechanisms of Noncoding Alzheimer's Disease-Associated Genetic Variants.推断非编码阿尔茨海默病相关遗传变异的分子机制。

J Alzheimers Dis. 2019;72(1):301-318. doi: 10.3233/JAD-190568.

PGA: post-GWAS analysis for disease gene identification.PGA：GWAS 后分析用于疾病基因鉴定。

Bioinformatics. 2018 May 15;34(10):1786-1788. doi: 10.1093/bioinformatics/btx845.

引用本文的文献

Integrated genomic analysis and CRISPRi implicates in Alzheimer's disease risk.综合基因组分析和CRISPR干扰技术表明其与阿尔茨海默病风险有关。

medRxiv. 2025 Jun 26:2025.06.25.25328705. doi: 10.1101/2025.06.25.25328705.

Towards interpretable drug interaction prediction dual-stage attention and Bayesian calibration with active learning.迈向可解释的药物相互作用预测：基于主动学习的双阶段注意力机制与贝叶斯校准

PeerJ Comput Sci. 2025 Apr 22;11:e2847. doi: 10.7717/peerj-cs.2847. eCollection 2025.

BTS: scalable Bayesian Tissue Score for prioritizing GWAS variants and their functional contexts across omics data.BTS：可扩展的贝叶斯组织评分，用于在组学数据中对全基因组关联研究（GWAS）变体及其功能背景进行优先级排序。

bioRxiv. 2025 Feb 5:2024.10.30.621077. doi: 10.1101/2024.10.30.621077.

Genetic, transcriptomic, histological, and biochemical analysis of progressive supranuclear palsy implicates glial activation and novel risk genes.进行性核上性麻痹的遗传、转录组学、组织学和生物化学分析提示胶质细胞激活和新的风险基因。

Nat Commun. 2024 Sep 9;15(1):7880. doi: 10.1038/s41467-024-52025-x.

The goldmine of GWAS summary statistics: a systematic review of methods and tools.全基因组关联研究汇总统计数据的宝库：方法与工具的系统综述

BioData Min. 2024 Sep 5;17(1):31. doi: 10.1186/s13040-024-00385-x.

Variant effect predictors: a systematic review and practical guide.变异效应预测因子：系统评价与实用指南。

Hum Genet. 2024 May;143(5):625-634. doi: 10.1007/s00439-024-02670-5. Epub 2024 Apr 4.

Scalable approaches for functional analyses of whole-genome sequencing non-coding variants.可扩展的全基因组测序非编码变异功能分析方法。

Hum Mol Genet. 2022 Oct 20;31(R1):R62-R72. doi: 10.1093/hmg/ddac191.

Alzheimer's Disease Variant Portal: A Catalog of Genetic Findings for Alzheimer's Disease.阿尔茨海默病变异门户：阿尔茨海默病遗传发现目录。

J Alzheimers Dis. 2022;86(1):461-477. doi: 10.3233/JAD-215055.

FILER: a framework for harmonizing and querying large-scale functional genomics knowledge.FILER：一个用于协调和查询大规模功能基因组学知识的框架。

NAR Genom Bioinform. 2022 Jan 14;4(1):lqab123. doi: 10.1093/nargab/lqab123. eCollection 2022 Mar.

本文引用的文献

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.NHGRI-EBI GWAS Catalog 于 2019 年发布的已发表全基因组关联研究、靶向基因芯片和汇总统计数据

Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012. doi: 10.1093/nar/gky1120.

INFERNO: inferring the molecular mechanisms of noncoding genetic variants.INFERNO：推断非编码遗传变异的分子机制。

Nucleic Acids Res. 2018 Sep 28;46(17):8740-8753. doi: 10.1093/nar/gky686.

GIGGLE: a search engine for large-scale integrated genome analysis.GIGGLE：一个用于大规模综合基因组分析的搜索引擎。

Nat Methods. 2018 Feb;15(2):123-126. doi: 10.1038/nmeth.4556. Epub 2018 Jan 8.

Functional mapping and annotation of genetic associations with FUMA.使用 FUMA 进行遗传关联的功能映射和注释。

Nat Commun. 2017 Nov 28;8(1):1826. doi: 10.1038/s41467-017-01261-5.

Genetic effects on gene expression across human tissues.基因对人体各组织基因表达的影响。

Nature. 2017 Oct 11;550(7675):204-213. doi: 10.1038/nature24277.

Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations.关联分析确定了38个炎症性肠病的易感基因座，并突出了不同人群间共有的遗传风险。

Nat Genet. 2015 Sep;47(9):979-986. doi: 10.1038/ng.3359. Epub 2015 Jul 20.

Integrative analysis of 111 reference human epigenomes.111 个人类参考基因组的综合分析。

Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248.

Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.使用汇总统计量对成对遗传关联研究之间的共定位进行贝叶斯检验。

PLoS Genet. 2014 May 15;10(5):e1004383. doi: 10.1371/journal.pgen.1004383. eCollection 2014 May.

An atlas of active enhancers across human cell types and tissues.人类细胞类型和组织中活跃增强子图谱。

Nature. 2014 Mar 27;507(7493):455-461. doi: 10.1038/nature12787.

Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease.对 74046 人的荟萃分析确定了 11 个阿尔茨海默病的新易感性位点。

Nat Genet. 2013 Dec;45(12):1452-8. doi: 10.1038/ng.2802. Epub 2013 Oct 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SparkINFERNO：一种可扩展的高通量管道，用于推断非编码遗传变异的分子机制。

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

摘要

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献