Department of Biochemistry, Faculty of Science, Palacký University in Olomouc, 78371 Olomouc, Czech Republic.
Division of Plant Sciences, University of Missouri, Columbia, MO 65201, USA.
Genes (Basel). 2023 Jan 1;14(1):123. doi: 10.3390/genes14010123.
The genome-wide association study (GWAS) is a popular genomic approach that identifies genomic regions associated with a phenotype and, thus, aims to discover causative mutations (CM) in the genes underlying the phenotype. However, GWAS discoveries are limited by many factors and typically identify associated genomic regions without the further ability to compare the viability of candidate genes and actual CMs. Therefore, the current methodology is limited to CM identification. In our recent work, we presented a novel approach to an empowered "GWAS to Genes" strategy that we named Synthetic phenotype to causative mutation (SP2CM). We established this strategy to identify CMs in soybean genes and developed a web-based tool for accuracy calculation (AccuTool) for a reference panel of soybean accessions. Here, we describe our further development of the tool that extends its utilization for other species and named it AccuCalc. We enhanced the tool for the analysis of datasets with a low-frequency distribution of a rare phenotype by automated formatting of a synthetic phenotype and added another accuracy-based GWAS evaluation criterion to the accuracy calculation. We designed AccuCalc as a Python package for GWAS data analysis for any user-defined species-independent variant calling format (vcf) or HapMap format (hmp) as input data. AccuCalc saves analysis outputs in user-friendly tab-delimited formats and also offers visualization of the GWAS results as Manhattan plots accentuated by accuracy. Under the hood of Python, AccuCalc is publicly available and, thus, can be used conveniently for the SP2CM strategy utilization for every species.
全基因组关联研究(GWAS)是一种流行的基因组方法,可识别与表型相关的基因组区域,从而旨在发现表型相关基因中的致病突变(CM)。然而,GWAS 发现受到许多因素的限制,通常只能识别相关的基因组区域,而无法进一步比较候选基因和实际 CM 的可行性。因此,目前的方法仅限于 CM 鉴定。在我们最近的工作中,我们提出了一种新的方法来增强“GWAS 到基因”策略,我们将其命名为合成表型到致病突变(SP2CM)。我们建立了这个策略来鉴定大豆基因中的 CM,并为大豆品系参考面板开发了一个用于准确性计算的基于网络的工具(AccuTool)。在这里,我们描述了该工具的进一步开发,该工具扩展了其在其他物种中的应用,并将其命名为 AccuCalc。我们通过自动化合成表型格式增强了该工具,以分析罕见表型低频分布的数据集,并为准确性计算添加了另一个基于准确性的 GWAS 评估标准。我们将 AccuCalc 设计为一个用于 GWAS 数据分析的 Python 包,任何用户定义的与物种无关的变异调用格式(vcf)或 HapMap 格式(hmp)都可以作为输入数据。AccuCalc 以用户友好的制表符分隔格式保存分析输出,并提供 GWAS 结果的可视化,以强调准确性的曼哈顿图。在 Python 的幕后,AccuCalc 是公开可用的,因此可以方便地用于每个物种的 SP2CM 策略利用。