Fernandes Júnior Gerardo A, Carvalheiro Roberto, de Oliveira Henrique N, Sargolzaei Mehdi, Costilla Roy, Ventura Ricardo V, Fonseca Larissa F S, Neves Haroldo H R, Hayes Ben J, de Albuquerque Lucia G
School of Agricultural and Veterinarian Sciences, UNESP, Jaboticabal, SP, 14884-900, Brazil.
National Council for Scientific and Technological Development, CNPq, Brasília, DF, 71605-001, Brazil.
Genet Sel Evol. 2021 Mar 12;53(1):27. doi: 10.1186/s12711-021-00622-5.
A cost-effective strategy to explore the complete DNA sequence in animals for genetic evaluation purposes is to sequence key ancestors of a population, followed by imputation mechanisms to infer marker genotypes that were not originally reported in a target population of animals genotyped with single nucleotide polymorphism (SNP) panels. The feasibility of this process relies on the accuracy of the genotype imputation in that population, particularly for potential causal mutations which may be at low frequency and either within genes or regulatory regions. The objective of the present study was to investigate the imputation accuracy to the sequence level in a Nellore beef cattle population, including that for variants in annotation classes which are more likely to be functional.
Information of 151 key sequenced Nellore sires were used to assess the imputation accuracy from bovine HD BeadChip SNP (~ 777 k) to whole-genome sequence. The choice of the sires aimed at optimizing the imputation accuracy of a genotypic database, comprised of about 10,000 genotyped Nellore animals. Genotype imputation was performed using two computational approaches: FImpute3 and Minimac4 (after using Eagle for phasing). The accuracy of the imputation was evaluated using a fivefold cross-validation scheme and measured by the squared correlation between observed and imputed genotypes, calculated by individual and by SNP. SNPs were classified into a range of annotations, and the accuracy of imputation within each annotation classification was also evaluated.
High average imputation accuracies per animal were achieved using both FImpute3 (0.94) and Minimac4 (0.95). On average, common variants (minor allele frequency (MAF) > 0.03) were more accurately imputed by Minimac4 and low-frequency variants (MAF ≤ 0.03) were more accurately imputed by FImpute3. The inherent Minimac4 Rsq imputation quality statistic appears to be a good indicator of the empirical Minimac4 imputation accuracy. Both software provided high average SNP-wise imputation accuracy for all classes of biological annotations.
Our results indicate that imputation to whole-genome sequence is feasible in Nellore beef cattle since high imputation accuracies per individual are expected. SNP-wise imputation accuracy is software-dependent, especially for rare variants. The accuracy of imputation appears to be relatively independent of annotation classification.
为了进行遗传评估而探索动物完整DNA序列的一种经济有效的策略是对群体中的关键祖先进行测序,然后通过插补机制来推断在使用单核苷酸多态性(SNP)芯片进行基因分型的目标动物群体中最初未报告的标记基因型。这个过程的可行性依赖于该群体中基因型插补的准确性,特别是对于可能处于低频且位于基因或调控区域内的潜在因果突变。本研究的目的是调查在Nellore肉牛群体中到序列水平的插补准确性,包括对更可能具有功能的注释类别的变异的插补准确性。
使用151头已测序的Nellore关键种公牛的信息来评估从牛HD BeadChip SNP(约777k)到全基因组序列的插补准确性。选择这些种公牛旨在优化由约10,000头基因分型的Nellore动物组成的基因型数据库的插补准确性。使用两种计算方法进行基因型插补:FImpute3和Minimac4(在使用Eagle进行定相之后)。使用五重交叉验证方案评估插补的准确性,并通过个体和SNP计算观察到的基因型与插补基因型之间的平方相关性来衡量。SNP被分类到一系列注释中,并且还评估了每个注释分类内的插补准确性。
使用FImpute3(0.94)和Minimac4(0.95)都实现了每头动物较高的平均插补准确性。平均而言,常见变异(次要等位基因频率(MAF)>0.03)由Minimac4插补得更准确,低频变异(MAF≤0.03)由FImpute3插补得更准确。固有的Minimac4 Rsq插补质量统计量似乎是经验性Minimac4插补准确性的一个良好指标。两种软件对所有生物注释类别都提供了较高的平均SNP水平插补准确性。
我们的结果表明,在Nellore肉牛中对全基因组序列进行插补是可行的,因为预期每个个体都有较高的插补准确性。SNP水平的插补准确性取决于软件,特别是对于稀有变异。插补准确性似乎相对独立于注释分类。