Suppr超能文献

基于云的交互式分析,用于处理 TB 级别的基因组变体数据。

Cloud-based interactive analytics for terabytes of genomic variants data.

机构信息

VA Palo Alto Health Care System, Palo Alto Epidemiology Research and Information Center for Genomics, CA 94304, USA.

Department of Genetics.

出版信息

Bioinformatics. 2017 Dec 1;33(23):3709-3715. doi: 10.1093/bioinformatics/btx468.

Abstract

MOTIVATION

Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.

RESULTS

We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.

AVAILABILITY AND IMPLEMENTATION

Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.

CONTACT

cuiping@stanford.edu or ptsao@stanford.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因组测序现在被广泛用于解决生物学功能、人类疾病、进化、生态系统和农业等各个领域的问题。由于这些数据的数量和多样性,需要一个强大的、可扩展的数据处理和分析解决方案。

结果

我们使用基于 Dremel 的云基础列式数据库提供交互式分析,以在大量基因组数据中执行信息压缩、全面的质量控制和生物信息检索。我们证明,这种大数据计算范例可以为常见的基因组分析提供数量级更快的周转时间,将通过 Linux 外壳提交的长时间运行的批处理作业转换为可以在几秒钟内从网络浏览器提出的问题。使用这种方法,我们评估了 475 个人类基因组的深度测序研究人群的基因组呼叫率、基因型和等位基因频率分布、基因组范围内的变异密度以及药物基因组学信息。

可用性和实现

我们的分析框架在 Google Cloud Platform 和 BigQuery 中实现。代码可在 https://github.com/StanfordBioinformatics/mvp_aaa_codelabs 获得。

联系人

cuiping@stanford.eduptsao@stanford.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
Cloud-based interactive analytics for terabytes of genomic variants data.
Bioinformatics. 2017 Dec 1;33(23):3709-3715. doi: 10.1093/bioinformatics/btx468.
3
Visualizing the geography of genetic variants.
Bioinformatics. 2017 Feb 15;33(4):594-595. doi: 10.1093/bioinformatics/btw643.
4
Assemblytics: a web analytics tool for the detection of variants from an assembly.
Bioinformatics. 2016 Oct 1;32(19):3021-3. doi: 10.1093/bioinformatics/btw369. Epub 2016 Jun 17.
7
GTRAC: fast retrieval from compressed collections of genomic variants.
Bioinformatics. 2016 Sep 1;32(17):i479-i486. doi: 10.1093/bioinformatics/btw437.
8
CRAMER: a lightweight, highly customizable web-based genome browser supporting multiple visualization instances.
Bioinformatics. 2020 Jun 1;36(11):3556-3557. doi: 10.1093/bioinformatics/btaa146.
9
SNiPA: an interactive, genetic variant-centered annotation browser.
Bioinformatics. 2015 Apr 15;31(8):1334-6. doi: 10.1093/bioinformatics/btu779. Epub 2014 Nov 26.
10
Aether: leveraging linear programming for optimal cloud computing in genomics.
Bioinformatics. 2018 May 1;34(9):1565-1567. doi: 10.1093/bioinformatics/btx787.

引用本文的文献

1
The Stanford Medicine data science ecosystem for clinical and translational research.
JAMIA Open. 2023 Aug 2;6(3):ooad054. doi: 10.1093/jamiaopen/ooad054. eCollection 2023 Oct.
3
Swarm: A federated cloud framework for large-scale variant analysis.
PLoS Comput Biol. 2021 May 12;17(5):e1008977. doi: 10.1371/journal.pcbi.1008977. eCollection 2021 May.

本文引用的文献

1
Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder.
Nat Neurosci. 2017 Apr;20(4):602-611. doi: 10.1038/nn.4524. Epub 2017 Mar 6.
2
Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms.
Proc Natl Acad Sci U S A. 2017 Jan 17;114(3):E327-E336. doi: 10.1073/pnas.1619052114. Epub 2016 Dec 28.
4
A genetic association study of CSMD1 and CSMD2 with cognitive function.
Brain Behav Immun. 2017 Mar;61:209-216. doi: 10.1016/j.bbi.2016.11.026. Epub 2016 Nov 25.
5
Evidence and resources to implement pharmacogenetic knowledge for precision medicine.
Am J Health Syst Pharm. 2016 Dec 1;73(23):1977-1985. doi: 10.2146/ajhp150977.
7
Complement inhibitor CSMD1 acts as tumor suppressor in human breast cancer.
Oncotarget. 2016 Nov 22;7(47):76920-76933. doi: 10.18632/oncotarget.12729.
8
Deep sequencing of 10,000 human genomes.
Proc Natl Acad Sci U S A. 2016 Oct 18;113(42):11901-11906. doi: 10.1073/pnas.1613365113. Epub 2016 Oct 4.
10
NEK1 variants confer susceptibility to amyotrophic lateral sclerosis.
Nat Genet. 2016 Sep;48(9):1037-42. doi: 10.1038/ng.3626. Epub 2016 Jul 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验