Suppr超能文献

GEN2VCF:一种将人类基因组重测序输出格式转换为 VCF 格式的工具。

GEN2VCF: a converter for human genome imputation output format to VCF format.

机构信息

Division of Genome Research, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, 187, Osongsaengmyeong 2-ro, Osong-eup, Heungdeok-gu, Cheongju-si, Chungcheongbuk-do, 28159, Republic of Korea.

Database and Bioinformatics Laboratory, Department of Computer Science, College of Electrical and Computer Engineering, Chungbuk National University, 28644, Cheongju, Republic of Korea.

出版信息

Genes Genomics. 2020 Oct;42(10):1163-1168. doi: 10.1007/s13258-020-00982-0. Epub 2020 Aug 16.

Abstract

BACKGROUND

For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time.

OBJECTIVE

We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support.

METHODS

The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures.

RESULTS

GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively.

CONCLUSIONS

GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.

摘要

背景

在人类全基因组关联研究中,基因型推断是提高关联作图能力的重要分析工具。当使用 IMPUTE 软件进行推断分析时,需要将推断输出(GEN 格式)转换为包含推断基因型剂量的变体调用格式(VCF),以便进行关联分析。然而,这种转换需要在一个流水线中使用多个软件包,处理时间非常长。

目的

我们开发了 GEN2VCF,这是一种快速便捷的 GEN 格式到 VCF 转换工具,支持剂量。

方法

我们比较了 GEN2VCF 与 BCFtools、QCTOOL 和 Oncofunco 的性能。测试数据集是一个 5000 个样本的 1Mb GEN 格式文件。为了确定各种样本大小的性能,我们从 1000 到 5000 个样本进行了测试,步长为 1000。运行时间和内存使用情况用作性能指标。

结果

GEN2VCF 在运行时间和内存使用方面表现出了显著的提高。与其他方法相比,GEN2VCF 的运行时间和内存使用至少分别降低了 1.4 倍和 7.4 倍。

结论

GEN2VCF 为用户提供了从 GEN 格式到 VCF 格式的高效转换,具有最佳猜测基因型、基因型后验概率和基因型剂量,并且在流水线中与其他软件包一起使用具有很大的灵活性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dc8/7497724/844e538cc0c7/13258_2020_982_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验