Boatwright J Lucas, Sapkota Sirjan, Kresovich Stephen
Department of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States.
Advanced Plant Technology, Clemson University, Clemson, SC, United States.
Front Genet. 2023 Mar 31;14:1143395. doi: 10.3389/fgene.2023.1143395. eCollection 2023.
High-throughput genomic and phenomic data have enhanced the ability to detect genotype-to-phenotype associations that can resolve broad pleiotropic effects of mutations on plant phenotypes. As the scale of genotyping and phenotyping has advanced, rigorous methodologies have been developed to accommodate larger datasets and maintain statistical precision. However, determining the functional effects of associated genes/loci is expensive and limited due to the complexity associated with cloning and subsequent characterization. Here, we utilized phenomic imputation of a multi-year, multi-environment dataset using which imputes missing data using kinship and correlated traits, and we screened insertions and deletions (InDels) from the recently whole-genome sequenced Sorghum Association Panel for putative loss-of-function effects. Candidate loci from genome-wide association results were screened for potential loss of function using a Bayesian Genome-Phenome Wide Association Study (BGPWAS) model across both functionally characterized and uncharacterized loci. Our approach is designed to facilitate validation of associations beyond traditional candidate gene and literature-search approaches and to facilitate the identification of putative variants for functional analysis and reduce the incidence of false-positive candidates in current functional validation methods. Using this Bayesian GPWAS model, we identified associations for previously characterized genes with known loss-of-function alleles, specific genes falling within known quantitative trait loci, and genes without any previous genome-wide associations while additionally detecting putative pleiotropic effects. In particular, we were able to identify the major tannin haplotypes at the locus and effects of InDels on the protein folding. Depending on the haplotype present, heterodimer formation with was significantly affected. We also identified major effect InDels in and , where proteins were truncated due to frameshift mutations that resulted in early stop codons. These truncated proteins also lost most of their functional domains, suggesting that these indels likely result in loss of function. Here, we show that the Bayesian GPWAS model is able to identify loss-of-function alleles that can have significant effects upon protein structure and folding as well as multimer formation. Our approach to characterize loss-of-function mutations and their functional repercussions will facilitate precision genomics and breeding by identifying key targets for gene editing and trait integration.
高通量基因组和表型组数据增强了检测基因型与表型关联的能力,这些关联能够解析突变对植物表型的广泛多效性影响。随着基因分型和表型分型规模的推进,已开发出严格的方法来处理更大的数据集并保持统计精度。然而,由于与克隆及后续表征相关的复杂性,确定相关基因/位点的功能效应既昂贵又有限。在此,我们利用一个多年、多环境数据集进行表型组插补,该数据集利用亲缘关系和相关性状来插补缺失数据,并且我们从最近进行了全基因组测序的高粱关联群体中筛选插入缺失(InDels)以寻找潜在的功能丧失效应。利用贝叶斯基因组 - 表型组全基因组关联研究(BGPWAS)模型,在功能已表征和未表征的位点上,对全基因组关联结果中的候选位点进行潜在功能丧失的筛选。我们的方法旨在促进超越传统候选基因和文献搜索方法的关联验证,并便于识别用于功能分析的推定变体,减少当前功能验证方法中假阳性候选物的发生率。使用这个贝叶斯GPWAS模型,我们鉴定了与具有已知功能丧失等位基因的先前已表征基因、已知数量性状位点内的特定基因以及之前没有全基因组关联的基因之间的关联,同时还检测到了推定的多效性效应。特别是,我们能够鉴定出该位点的主要单宁单倍型以及插入缺失对蛋白质折叠的影响。根据存在的单倍型,与 的异二聚体形成受到显著影响。我们还在 和 中鉴定出了主要效应插入缺失,其中由于移码突变导致早期终止密码子,蛋白质被截断。这些截断的蛋白质也失去了大部分功能结构域,表明这些插入缺失可能导致功能丧失。在此,我们表明贝叶斯GPWAS模型能够识别对蛋白质结构、折叠以及多聚体形成有显著影响的功能丧失等位基因。我们表征功能丧失突变及其功能影响的方法将通过识别基因编辑和性状整合的关键靶点来促进精准基因组学和育种。