Suppr超能文献

单体型参考面板的大小和组成会影响牛低深度测序数据的准确性。

The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle.

机构信息

Animal Genomics, ETH Zürich, Universitätstrasse 2, Zürich, 8092, Switzerland.

出版信息

Genet Sel Evol. 2023 May 11;55(1):33. doi: 10.1186/s12711-023-00809-y.

Abstract

BACKGROUND

Low-pass sequencing followed by sequence variant genotype imputation is an alternative to the routine microarray-based genotyping in cattle. However, the impact of haplotype reference panels and their interplay with the coverage of low-pass whole-genome sequencing data have not been sufficiently explored in typical livestock settings where only a small number of reference samples is available.

METHODS

Sequence variant genotyping accuracy was compared between two variant callers, GATK and DeepVariant, in 50 Brown Swiss cattle with sequencing coverages ranging from 4- to 63-fold. Haplotype reference panels of varying sizes and composition were built with DeepVariant based on 501 individuals from nine breeds. High-coverage sequence data for 24 Brown Swiss cattle were downsampled to between 0.01- and 4-fold to mimic low-pass sequencing. GLIMPSE was used to infer sequence variant genotypes from the low-pass sequencing data using different haplotype reference panels. The accuracy of the sequence variant genotypes that were inferred from low-pass sequencing data was compared with sequence variant genotypes called from high-coverage data.

RESULTS

DeepVariant was used to establish bovine haplotype reference panels because it outperformed GATK in all evaluations. Within-breed haplotype reference panels were more accurate and efficient to impute sequence variant genotypes from low-pass sequencing than equally-sized multibreed haplotype reference panels for all target sample coverages and allele frequencies. F1 scores greater than 0.9, which indicate high harmonic means of recall and precision of called genotypes, were achieved with 0.25-fold sequencing coverage when large breed-specific haplotype reference panels (n = 150) were used. In absence of such large within-breed haplotype panels, variant genotyping accuracy from low-pass sequencing could be increased either by adding non-related samples to the haplotype reference panel or by increasing the coverage of the low-pass sequencing data. Sequence variant genotyping from low-pass sequencing was substantially less accurate when the reference panel lacked individuals from the target breed.

CONCLUSIONS

Variant genotyping is more accurate with DeepVariant than GATK. DeepVariant is therefore suitable to establish bovine haplotype reference panels. Medium-sized breed-specific haplotype reference panels and large multibreed haplotype reference panels enable accurate imputation of low-pass sequencing data in a typical cattle breed.

摘要

背景

低通测序(low-pass sequencing)后进行序列变异基因型推断,是一种替代牛种常规微阵列基因分型的方法。然而,在典型的家畜环境中,由于只有少数参考样本可用,因此尚未充分探讨单倍型参考面板的影响及其与低通全基因组测序数据覆盖范围的相互作用。

方法

在测序覆盖率为 4 至 63 倍的 50 头瑞士褐牛中,比较了两种变异调用器 GATK 和 DeepVariant 的序列变异基因分型准确性。基于来自 9 个品种的 501 个个体,使用 DeepVariant 构建了不同大小和组成的单倍型参考面板。对 24 头瑞士褐牛的高覆盖率序列数据进行了下采样,模拟低通测序,下采样比例为 0.01 至 4 倍。使用 GLIMPSE 从低通测序数据中推断不同单倍型参考面板的序列变异基因型。比较了从低通测序数据推断的序列变异基因型与从高覆盖率数据调用的序列变异基因型的准确性。

结果

DeepVariant 用于建立牛的单倍型参考面板,因为它在所有评估中都优于 GATK。在所有目标样本覆盖率和等位基因频率下,与同等大小的多品种单倍型参考面板相比,基于同一品种的单倍型参考面板更准确、更有效地推断低通测序的序列变异基因型。当使用大型特定品种的单倍型参考面板(n=150)时,在 0.25 倍测序覆盖率下可实现大于 0.9 的 F1 分数,这表明调用基因型的调和均值召回率和精度都很高。在没有这种大型特定品种的单倍型面板的情况下,可以通过向单倍型参考面板中添加非相关样本或增加低通测序数据的覆盖率来提高低通测序的变异基因分型准确性。当参考面板缺少目标品种的个体时,低通测序的序列变异基因分型准确性会大大降低。

结论

与 GATK 相比,DeepVariant 进行变异基因分型更准确。因此,DeepVariant 适合建立牛的单倍型参考面板。中型特定品种的单倍型参考面板和大型多品种单倍型参考面板能够准确推断典型牛种的低通测序数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9abe/10173671/75df926feae5/12711_2023_809_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验