Suppr超能文献

绵羊全基因组序列信息的推断准确性。

Accuracy of imputation to whole-genome sequence in sheep.

机构信息

Agriculture Victoria, AgriBio, Centre for AgriBioscience, 5 Ring Rd, Bundoora, VIC, 3083, Australia.

Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW, 2351, Australia.

出版信息

Genet Sel Evol. 2019 Jan 17;51(1):1. doi: 10.1186/s12711-018-0443-5.

Abstract

BACKGROUND

The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep.

RESULTS

The accuracy of imputation from the Ovine Infinium HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R ≤ 0.4.

CONCLUSIONS

The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses.

摘要

背景

使用全基因组序列 (WGS) 数据进行基因组预测和关联研究是非常理想的,因为因果突变应该存在于数据中。对来自多个品种的 935 只绵羊进行测序,为利用这些已测序的 935 只绵羊的参考群体,将绵羊 SNP 基因分型数据进行 WGS 数据的推断提供了机会。本研究评估了使用该参考群体对 SNP 基因型进行 WGS 数据推断的准确性。

结果

评估了 SNP 基因型向 WGS 推断的准确性,针对三个目标品种:美利奴羊、多赛特羊和边境莱斯特羊×美利奴羊。尽管在测序参考群体中,美利奴羊个体比多赛特羊个体多,但多赛特羊品种的推断准确性最高。此外,与使用较小的单一品种参考群体相比,使用较大的多品种参考群体时,经验推断准确性更高(最高可达 1.7%)。使用 Minimac3 或 FImpute 软件,跨目标品种的平均推断准确性为 0.94。经验推断准确性在整个基因组中差异很大;六个染色体携带一个或多个 Mb 的区域,平均推断准确性<0.7。五个变异注释类别的推断准确性范围从 0.87(错义)到 0.94(内含子变异),其中较低的准确性对应于较高比例的稀有等位基因。Minimac3 报告的推断质量统计量 (R) 与经验推断准确性呈明显正相关。因此,首先丢弃 R 值低于 0.4 的推断变异,跨目标品种的平均经验准确性提高到 0.97。尽管在具有推断 WGS 的绵羊多品种群体中,过滤 R 值对基因组预测的准确性影响较小,但当使用 R 值≤0.4 的变异时,基因组遗传力显然较低。

结论

对于所有目标品种,平均推断准确性都很高,并且通过将较小的品种组合成一个多品种参考,准确性得到提高。我们发现,Minimac3 软件推断质量统计量 (R) 是经验推断准确性的有用指标,可在下游分析之前去除非常不准确的推断变异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a744/6337865/d4536bcfe4bd/12711_2018_443_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验