The Department of Industrial and Systems Engineering, Kongju National University, Cheonan, South Korea.
PLoS One. 2020 Jul 15;15(7):e0236139. doi: 10.1371/journal.pone.0236139. eCollection 2020.
In this study, we suggested a hypothesis test method that was robust to different genotype encodings in a genome-wide association analysis of continuous traits. When the population stratification is corrected for using a method based on principal component analysis, ordinally (or categorically) encoded genotypes are adjusted and turn into continuous values. Due to the adjustment of the encoded genotype, the association test result using conventional methods, such as the test of Pearson's correlation coefficient, was shown to be dependent on how genotypes were encoded. To overcome this shortcoming, we proposed a non-parametric test based on Kendall's tau. Because Kendall's tau deals with rank, rather than value, associations between adjusted genotype and phenotype values, Kendall's test can be more robust than Pearson's test under different genotype encodings. We assessed the robustness of Kendall's test and compared with that of Pearson's test in terms of the difference in p-values obtained by using different genotype encodings. With simulated as well as real data set, we demonstrated that Kendall's test was more robust than Pearson's test under different genotype encodings. The proposed method can be applicable to the broad topics of interest in population genetics and comparative genomics, in which novel genetic variants are associated with traits. This study may also bring about a cautious approach to the genotype encoding in the numerical analysis.
在这项研究中,我们提出了一种假设检验方法,该方法在连续性状的全基因组关联分析中对不同的基因型编码具有稳健性。当使用基于主成分分析的方法校正群体分层时,有序(或分类)编码的基因型被调整并转换为连续值。由于编码基因型的调整,使用传统方法(如 Pearson 相关系数检验)的关联检验结果被证明取决于基因型的编码方式。为了克服这一缺点,我们提出了一种基于 Kendall tau 的非参数检验。由于 Kendall tau 处理的是调整后基因型和表型值之间的秩,而不是值,因此 Kendall 检验在不同的基因型编码下比 Pearson 检验更稳健。我们评估了 Kendall 检验的稳健性,并根据使用不同基因型编码获得的 p 值差异与 Pearson 检验进行了比较。通过模拟和真实数据集,我们证明了 Kendall 检验在不同的基因型编码下比 Pearson 检验更稳健。所提出的方法可以应用于群体遗传学和比较基因组学中感兴趣的广泛主题,其中新的遗传变异与性状相关。本研究也可能促使人们在数值分析中对基因型编码采取谨慎的态度。