Suppr超能文献

SparkINFERNO:一种可扩展的高通量管道,用于推断非编码遗传变异的分子机制。

SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.

机构信息

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Genomics and Computational Biology Graduate Group.

出版信息

Bioinformatics. 2020 Jun 1;36(12):3879-3881. doi: 10.1093/bioinformatics/btaa246.

Abstract

SUMMARY

We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.

AVAILABILITY AND IMPLEMENTATION

SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.

CONTACT

lswang@pennmedicine.upenn.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

我们报告了基于 Spark 的非编码遗传变异分子机制推断(SparkINFERNO),这是一个可扩展的生物信息学管道,用于描述全基因组关联研究(GWAS)关联发现中的非编码基因组。SparkINFERNO 优先考虑 GWAS 关联信号背后的因果变异,并报告相关的调控元件、组织背景以及它们影响的可能靶基因。为了实现这一目标,SparkINFERNO 算法将 GWAS 汇总统计信息与功能基因组学数据集的大规模集合集成在一起,这些数据集涵盖了增强子活性、转录因子结合、表达数量性状基因座和其他功能数据集,跨越 400 多种组织和细胞类型。通过使用 Apache Spark 和基于 Giggle 的基因组索引实现的底层 API 实现了可扩展性。我们在大型 GWAS 上评估了 SparkINFERNO,并表明 SparkINFERNO 的效率比其他方法提高了 60 多倍,并且可以根据数据大小和计算资源量进行扩展。

可用性和实现

SparkINFERNO 在具有 Apache Spark 环境的集群或单个服务器上运行,可在 https://bitbucket.org/wanglab-upenn/SparkINFERNOhttps://hub.docker.com/r/wanglab/spark-inferno 上获得。

联系方式

lswang@pennmedicine.upenn.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7301/7320617/a91499ab9dca/btaa246f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验