Suppr超能文献

kGWASflow:一种基于 k-mer 的 GWAS 的模块化、灵活和可重复的 Snakemake 工作流程。

kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS.

机构信息

Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.

Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA 30602, USA.

出版信息

G3 (Bethesda). 2023 Dec 29;14(1). doi: 10.1093/g3journal/jkad246.

Abstract

Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).

摘要

全基因组关联研究(GWAS)已被广泛用于识别与复杂性状相关的遗传变异。尽管这种方法取得了成功并广受欢迎,但传统的 GWAS 方法存在多种局限性。出于这个原因,已经开发出了更新的 GWAS 方法,包括使用泛基因组而不是参考基因组,以及利用单核苷酸多态性以外的标记,如结构变异和 k-mer。基于 k-mer 的 GWAS 方法近年来尤其受到研究人员的关注。然而,这些新的方法学可能很复杂,并且难以实施。在这里,我们提出了 kGWASflow,这是一种模块化、用户友好且可扩展的工作流程,用于使用 k-mer 进行 GWAS。我们采用了现有的 kmersGWAS 方法,并使用 Snakemake 和 Conda 等管理工具将其转化为更简单、更易于访问的工作流程,消除了由于缺少依赖项和版本冲突而导致的挑战。kGWASflow 通过使用 Snakemake 自动执行每个步骤并使用 Docker 等容器化工具,提高了 kmersGWAS 方法的可重复性。该工作流程涵盖了补充组件,如质量控制、读取修剪程序以及生成汇总统计信息。kGWASflow 还提供了 GWAS 后分析选项,用于识别与性状相关的 k-mer 的基因组位置和上下文。kGWASflow 可以应用于任何生物体,并且只需要很少的编程技能。kGWASflow 可在 GitHub(https://github.com/akcorut/kGWASflow)和 Bioconda(https://anaconda.org/bioconda/kgwasflow)上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ab6/10755180/bbe072adf44b/jkad246f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验