Department of Plant Production and Genetics, Faculty of Agriculture, Urmia University, Urmia, Iran.
Dryland Agricultural Research Institute (DARI), Agriculture Research, Education and Extension Organization (AREEO), Maragheh, Iran.
Sci Rep. 2023 Jun 19;13(1):9927. doi: 10.1038/s41598-023-36134-z.
Principal component analysis (PCA) is widely used in various genetics studies. In this study, the role of classical PCA (cPCA) and robust PCA (rPCA) was evaluated explicitly in genome-wide association studies (GWAS). We evaluated 294 wheat genotypes under well-watered and rain-fed, focusing on spike traits. First, we showed that some phenotypic and genotypic observations could be outliers based on cPCA and different rPCA algorithms (Proj, Grid, Hubert, and Locantore). Hubert's method provided a better approach to identifying outliers, which helped to understand the nature of these samples. These outliers led to the deviation of the heritability of traits from the actual value. Then, we performed GWAS with 36,000 single nucleotide polymorphisms (SNPs) based on the traditional approach and two robust strategies. In the conventional approach and using the first three components of cPCA as population structure, 184 and 139 marker-trait associations (MTAs) were identified for five traits in well-watered and rain-fed environments, respectively. In the first robust strategy and when rPCA was used as population structure in GWAS, we observed that the Hubert and Grid methods identified new MTAs, especially for yield and spike weight on chromosomes 7A and 6B. In the second strategy, we followed the classical and robust principal component-based GWAS, where the first two PCs obtained from phenotypic variables were used instead of traits. In the recent strategy, despite the similarity between the methods, some new MTAs were identified that can be considered pleiotropic. Hubert's method provided a better linear combination of traits because it had the most MTAs in common with the traditional approach. Newly identified SNPs, including rs19833 (5B) and rs48316 (2B), were annotated with important genes with vital biological processes and molecular functions. The approaches presented in this study can reduce the misleading GWAS results caused by the adverse effect of outlier observations.
主成分分析(PCA)广泛应用于各种遗传学研究。在这项研究中,我们明确评估了经典 PCA(cPCA)和稳健 PCA(rPCA)在全基因组关联研究(GWAS)中的作用。我们在充分供水和雨养条件下评估了 294 个小麦基因型,重点关注穗部性状。首先,我们表明,基于 cPCA 和不同的 rPCA 算法(Proj、Grid、Hubert 和 Locantore),一些表型和基因型观测值可能是异常值。Hubert 方法提供了一种更好的识别异常值的方法,有助于了解这些样本的性质。这些异常值导致性状的遗传力偏离实际值。然后,我们基于传统方法和两种稳健策略,使用 36000 个单核苷酸多态性(SNP)进行了 GWAS。在传统方法中,使用 cPCA 的前三个成分作为群体结构,在充分供水和雨养环境下分别鉴定到 184 和 139 个标记-性状关联(MTA)。在第一种稳健策略中,当 rPCA 作为 GWAS 中的群体结构时,我们观察到 Hubert 和 Grid 方法鉴定了新的 MTA,特别是在 7A 和 6B 染色体上的产量和穗重。在第二种策略中,我们遵循基于经典和稳健主成分的 GWAS,使用表型变量获得的前两个 PC 代替性状。在最近的策略中,尽管方法相似,但也鉴定到了一些可以认为是多效性的新 MTA。Hubert 方法提供了对性状的更好线性组合,因为它与传统方法有最多的 MTA 共同。新鉴定的 SNP,包括 rs19833(5B)和 rs48316(2B),与具有重要生物学过程和分子功能的重要基因注释。本研究提出的方法可以减少异常观测值的不利影响引起的误导性 GWAS 结果。