奥德赛：一个用于全基因组遗传数据相位、插补和分析的半自动流水线。

Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.

机构信息

Department of Biology, Indiana University-Purdue University Indianapolis, 723 W. Michigan Street, Indianapolis, IN, USA.

Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, IN, USA.

出版信息

BMC Bioinformatics. 2019 Jun 28;20(1):364. doi: 10.1186/s12859-019-2964-5.

DOI:10.1186/s12859-019-2964-5

PMID:31253090

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6599316/

Abstract

BACKGROUND

Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses.

RESULTS

In an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS).

CONCLUSION

Odyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3-8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline.

摘要

背景

基因组测序、混合分辨率和全基因组关联分析是及时且计算密集型的过程，具有许多组合和必要的步骤。当构建和安装这些分析所需的运行程序时，分析时间会进一步增加。对于那些可能不精通编程语言但希望亲自进行这些操作的科学家来说，要利用大量可用的程序来进行这些分析，他们需要花费很长的时间来学习。

结果

为了简化整个过程，为处理大数据的科学家提供易于使用的步骤，开发了 Odyssey 管道。Odyssey 是一个简化的、高效的、半自动的全基因组测序和分析管道，它可以准备原始遗传数据，执行预测序质量控制、相位分析、测序、后测序质量控制、人群分层分析以及与统计数据分析的全基因组关联分析，包括结果可视化。Odyssey 是一个集成了 PLINK、SHAPEIT、Eagle、IMPUTE、Minimac 和几个 R 包的程序的管道，创建了一个无缝、易于使用且模块化的工作流程，通过单个用户友好的配置文件进行控制。Odyssey 是为了兼容性而构建的，因此它利用了 Singularity 容器解决方案，可以在 Linux、MacOS 和 Windows 平台上运行。它还可以从简单的桌面轻松扩展到高性能系统 (HPS)。

结论

Odyssey 促进了高效快速的全基因组关联分析自动化，从原始遗传数据到基因组：表型关联可视化和分析结果，平均在 3-8 小时内，具体取决于输入数据、管道内程序的选择以及可用的计算机资源。Odyssey 构建时具有灵活性、可移植性、兼容性、可扩展性和易于设置。不太熟悉编程的生物学家现在可以使用这个易于使用的管道来处理他们自己的大数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a88/6599316/8d352651644f/12859_2019_2964_Fig1_HTML.jpg

相似文献

Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.

BMC Bioinformatics. 2019 Jun 28;20(1):364. doi: 10.1186/s12859-019-2964-5.

ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications.

BMC Bioinformatics. 2023 Nov 8;24(1):424. doi: 10.1186/s12859-023-05548-x.

Molgenis-impute: imputation pipeline in a box.

BMC Res Notes. 2015 Aug 19;8:359. doi: 10.1186/s13104-015-1309-3.

Gimpute: an efficient genetic data imputation pipeline.

Bioinformatics. 2019 Apr 15;35(8):1433-1435. doi: 10.1093/bioinformatics/bty814.

Genotype imputation in genome-wide association studies.

Curr Protoc Hum Genet. 2013 Jul;Chapter 1:Unit 1.25. doi: 10.1002/0471142905.hg0125s78.

BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.

Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.

RICOPILI: Rapid Imputation for COnsortias PIpeLIne.

Bioinformatics. 2020 Feb 1;36(3):930-933. doi: 10.1093/bioinformatics/btz633.

fcGENE: a versatile tool for processing and transforming SNP datasets.

PLoS One. 2014 Jul 22;9(7):e97589. doi: 10.1371/journal.pone.0097589. eCollection 2014.

PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data.

BMC Bioinformatics. 2023 Apr 5;24(1):135. doi: 10.1186/s12859-023-05169-4.

genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools.

Bioinformatics. 2016 Dec 1;32(23):3661-3663. doi: 10.1093/bioinformatics/btw487. Epub 2016 Aug 6.

引用本文的文献

ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications.

BMC Bioinformatics. 2023 Nov 8;24(1):424. doi: 10.1186/s12859-023-05548-x.

Polygenic influences on the behavioral effects of alcohol withdrawal in a mixed-ancestry population from the collaborative study on the genetics of alcoholism (COGA).

Mol Cell Neurosci. 2023 Jun;125:103851. doi: 10.1016/j.mcn.2023.103851. Epub 2023 Apr 7.

Canary: an automated tool for the conversion of MaCH imputed dosage files to PLINK files.

BMC Bioinformatics. 2022 Jul 27;23(1):304. doi: 10.1186/s12859-022-04822-8.

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline.

J Open Source Softw. 2021;6(59). doi: 10.21105/joss.02957. Epub 2021 Mar 2.

BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.

Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.

Shared heritability of human face and brain shape.

Nat Genet. 2021 Jun;53(6):830-839. doi: 10.1038/s41588-021-00827-w. Epub 2021 Apr 5.

本文引用的文献

Singularity: Scientific containers for mobility of compute.

PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017.

Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome.

Sci Rep. 2017 Apr 21;7:46398. doi: 10.1038/srep46398.

Reference-based phasing using the Haplotype Reference Consortium panel.

Nat Genet. 2016 Nov;48(11):1443-1448. doi: 10.1038/ng.3679. Epub 2016 Oct 3.

Next-generation genotype imputation service and methods.

Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656. Epub 2016 Aug 29.

A reference panel of 64,976 haplotypes for genotype imputation.

Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.

genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools.

Bioinformatics. 2016 Dec 1;32(23):3661-3663. doi: 10.1093/bioinformatics/btw487. Epub 2016 Aug 6.

Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis.

Nat Genet. 2016 Sep;48(9):1043-8. doi: 10.1038/ng.3622. Epub 2016 Jul 25.

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.

Am J Hum Genet. 2016 Mar 3;98(3):456-472. doi: 10.1016/j.ajhg.2015.12.022. Epub 2016 Feb 25.

A global reference for human genetic variation.

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Second-generation PLINK: rising to the challenge of larger and richer datasets.

Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

奥德赛：一个用于全基因组遗传数据相位、插补和分析的半自动流水线。

Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献