Suppr超能文献

Canary:一个用于将 MaCH 导入的剂量文件转换为 PLINK 文件的自动化工具。

Canary: an automated tool for the conversion of MaCH imputed dosage files to PLINK files.

机构信息

Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong SAR, China.

.

出版信息

BMC Bioinformatics. 2022 Jul 27;23(1):304. doi: 10.1186/s12859-022-04822-8.

Abstract

BACKGROUND

Previous studies have demonstrated the value of re-analysing publicly available genetics data with recent analytical approaches. Publicly available datasets, such as the Women's Health Initiative (WHI) offered by the database of genotypes and phenotypes (dbGaP), provide a wealthy resource for researchers to perform multiple analyses, including Genome-Wide Association Studies. Often, the genetic information of individuals in these datasets are stored in imputed dosage files output by MaCH; mldose and mlinfo files. In order for researchers to perform GWAS studies with this data, they must first be converted to a file format compatible with their tool of choice e.g., PLINK. Currently, there is no published tool which easily converts the datasets provided in MACH dosage files into PLINK-ready files.

RESULTS

Herein, we present Canary a singularity-based tool which converts MaCH dosage files into PLINK-compatible files with a single line of user input at the command line. Further, we provide a detailed tutorial on preparation of phenotype files. Moreover, Canary comes with preinstalled software often used during GWAS studies, to further increase the ease-of-use of HPC systems for researchers.

CONCLUSIONS

Until now, conversion of imputed data in the form of MaCH mldose and mlinfo files needed to be completed manually. Canary uses singularity container technology to allow users to automatically convert these MaCH files into PLINK compatible files. Additionally, Canary provides researchers with a platform to conduct GWAS analysis more easily as it contains essential software needed for conducting GWAS studies, such as PLINK and Bioconductor. We hope that this tool will greatly increase the ease at which researchers can perform GWAS with imputed data, particularly on HPC environments.

摘要

背景

先前的研究已经证明了使用最新的分析方法重新分析公开可用的遗传学数据的价值。公开可用的数据集,如数据库基因型和表型(dbGaP)提供的妇女健康倡议(WHI),为研究人员提供了丰富的资源,可以进行多次分析,包括全基因组关联研究。通常,这些数据集中个体的遗传信息存储在由 MaCH 输出的估算剂量文件中;mldose 和 mlinfo 文件。为了让研究人员能够使用这些数据进行 GWAS 研究,他们必须首先将其转换为与他们选择的工具兼容的文件格式,例如 PLINK。目前,没有发布的工具可以轻松地将 MaCH 剂量文件转换为适用于 PLINK 的文件。

结果

在此,我们介绍了 Canary,这是一种基于 singularity 的工具,只需在命令行上输入一行用户输入,即可将 MaCH 剂量文件转换为与 PLINK 兼容的文件。此外,我们提供了一个关于准备表型文件的详细教程。此外,Canary 还预安装了在 GWAS 研究中经常使用的软件,以进一步提高研究人员使用 HPC 系统的易用性。

结论

到目前为止,以 MaCH mldose 和 mlinfo 文件形式的估算数据的转换必须手动完成。Canary 使用 singularity 容器技术允许用户自动将这些 MaCH 文件转换为与 PLINK 兼容的文件。此外,Canary 为研究人员提供了一个更轻松地进行 GWAS 分析的平台,因为它包含了进行 GWAS 研究所需的基本软件,如 PLINK 和 Bioconductor。我们希望这个工具将极大地提高研究人员使用估算数据进行 GWAS 的便利性,特别是在 HPC 环境中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/405f/9327220/38ea03e22220/12859_2022_4822_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验