Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.
Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia.
Sci Rep. 2017 May 31;7(1):2537. doi: 10.1038/s41598-017-02487-5.
We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina "Platinum" genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).
我们开发了一种计算方法,可对 FASTQ 格式的基因组数据中的独特 k-mer 进行计数,并利用这些信息推断已知变体的基因型。FastGT 可以在不到 1 小时的时间内使用普通的低成本服务器硬件检测 30x 基因组中的变体。与两个 Illumina“Platinum”基因组的基因型总体一致性为 99.96%,与 Illumina HumanOmniExpress 的基因型一致性为 99.82%。我们的方法提供了 k-mer 数据库,可用于对大约 3000 万个单核苷酸变体 (SNV) 进行同时基因分型,包括来自 Y 染色体的 >23,000 个 SNV。FastGT 软件的源代码可在 GitHub 上获得(https://github.com/bioinfo-ut/GenomeTester4/)。