用于分析犬低深度测序中导入基因型的最佳实践。

Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.

机构信息

Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50, Room 5351, Bethesda, MD, 20892 , USA.

State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China.

出版信息

Mamm Genome. 2022 Mar;33(1):213-229. doi: 10.1007/s00335-021-09914-z. Epub 2021 Sep 8.

DOI:10.1007/s00335-021-09914-z

PMID:34498136

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8913487/

Abstract

Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.

摘要

尽管基于 DNA 芯片的全基因组关联研究 (GWAS) 方法允许收集数千个低成本的基因型，但这往往是以分辨率和完整性为代价的，因为 SNP 芯片技术最终受到在芯片开发过程中选择的 SNP 的限制。一种替代的低成本方法是低深度全基因组测序 (WGS) 后进行 imputation。低深度 WGS 和 imputation 不是依赖于一组精选位点的高基因型置信度，而是依赖于从数百万个随机采样的低置信度基因型中获得的综合信息。为了在犬中研究低深度 WGS 和 imputation，我们通过从 51 个不同品种中评估 97 个高覆盖率 (>15×) WGS 数据集的子集到大约 1×的覆盖率，模拟低深度 WGS，来评估准确性和性能。使用来自 91 个品种的 676 只犬的参考面板，从下采样的数据中 impute 基因型，并将其与从高覆盖率 WGS 生成的真实基因型集进行比较。使用我们的真实集，我们优化了一种变体质量过滤策略，保留了大约 80%的 1400 万个 imputed 位点，并将 imputation 错误率从 3.0%降低到 1.5%。仍有 700 万个位点具有 MAF>5%和平均 imputation 质量评分 0.95。最后，我们模拟了 imputation 错误对病例对照 GWAS 结果的影响，其中小效应大小受到的影响最大，中到大效应大小受到的影响较小。这些分析为犬的低深度 WGS-imputed 基因型的研究设计和数据后处理提供了最佳实践指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/868d/8913487/643cfbdabfe2/335_2021_9914_Fig1_HTML.jpg

相似文献

Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.用于分析犬低深度测序中导入基因型的最佳实践。

Mamm Genome. 2022 Mar;33(1):213-229. doi: 10.1007/s00335-021-09914-z. Epub 2021 Sep 8.

Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout.虹鳟低覆盖度全基因组测序数据的精确基因型推断。

G3 (Bethesda). 2024 Sep 4;14(9). doi: 10.1093/g3journal/jkae168.

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.利用简化基因组测序（GBS）和填充技术对圈养非人灵长类动物进行全基因组特征分析。

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies.使用多个猪群体进行全基因组序列推断及其在全基因组关联研究中的应用。

Genet Sel Evol. 2019 Jan 24;51(1):2. doi: 10.1186/s12711-019-0445-y.

GWAS on Imputed Whole-Genome Resequencing From Genotyping-by-Sequencing Data for Farrowing Interval of Different Parities in Pigs.基于测序分型数据进行猪不同胎次产仔间隔的全基因组重测序推算的全基因组关联研究

Front Genet. 2019 Oct 18;10:1012. doi: 10.3389/fgene.2019.01012. eCollection 2019.

Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics.比较低通量测序和基因分型用于药物遗传学中的性状定位

BMC Genomics. 2021 Mar 20;22(1):197. doi: 10.1186/s12864-021-07508-2.

Imputation strategies for low-coverage whole-genome sequencing data and their effects on genomic prediction and genome-wide association studies in pigs.低覆盖度全基因组测序数据的插补策略及其对猪基因组预测和全基因组关联研究的影响。

Animal. 2024 Sep;18(9):101258. doi: 10.1016/j.animal.2024.101258. Epub 2024 Jul 25.

Examining the Impact of Imputation Errors on Fine-Mapping Using DNA Methylation QTL as a Model Trait.利用 DNA 甲基化 QTL 作为模型性状，检验插补误差对精细定位的影响。

Genetics. 2019 Jul;212(3):577-586. doi: 10.1534/genetics.118.301861. Epub 2019 Apr 30.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle.评估插补序列变异基因型的准确性及其在牛因果变异检测中的效用。

Genet Sel Evol. 2017 Feb 21;49(1):24. doi: 10.1186/s12711-017-0301-x.

引用本文的文献

Large-scale genomic analysis of the domestic dog informs biological discovery.对家犬进行大规模基因组分析为生物学发现提供了信息。

Genome Res. 2024 Jul 23;34(6):811-821. doi: 10.1101/gr.278569.123.

Imputation of ancient canid genomes reveals inbreeding history over the past 10,000 years.古代犬科动物基因组的插补揭示了过去一万年的近亲繁殖历史。

bioRxiv. 2024 Jul 3:2024.03.15.585179. doi: 10.1101/2024.03.15.585179.

A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy.低通测序和单倍型准确性的插补问题的一个警示性案例。

Genet Sel Evol. 2024 Jan 12;56(1):6. doi: 10.1186/s12711-024-00875-w.

Low-pass sequencing plus imputation using avidity sequencing displays comparable imputation accuracy to sequencing by synthesis while reducing duplicates.低通测序加上亲和力测序的填补，在减少重复的同时，显示出与合成测序相当的填补准确性。

G3 (Bethesda). 2024 Feb 7;14(2). doi: 10.1093/g3journal/jkad276.

Best practices for genotype imputation from low-coverage sequencing data in natural populations.自然群体中基于低覆盖度测序数据进行基因型填充的最佳实践

Mol Ecol Resour. 2023 Aug 21. doi: 10.1111/1755-0998.13854.

Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture.犬 10K 联盟对 2000 只犬科动物进行基因组测序，增进了对种群动态、基因组功能和结构的了解。

Genome Biol. 2023 Aug 15;24(1):187. doi: 10.1186/s13059-023-03023-7.

Skim resequencing finely maps the downy mildew resistance loci and in spinach cultivars whale and Lazio.重测序精细定位了菠菜品种鲸鱼和拉齐奥中的霜霉病抗性位点。

Hortic Res. 2023 Apr 19;10(6):uhad076. doi: 10.1093/hr/uhad076. eCollection 2023 Jun.

GWAS using low-pass whole genome sequence reveals a novel locus in canine congenital idiopathic megaesophagus.全基因组低深度测序关联分析揭示犬先天性特发性巨食管的新基因座

Mamm Genome. 2023 Sep;34(3):464-472. doi: 10.1007/s00335-023-09991-2. Epub 2023 Apr 11.

An autoencoder-based deep learning method for genotype imputation.一种基于自动编码器的深度学习基因分型填充方法。

Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.

Ancestry-inclusive dog genomics challenges popular breed stereotypes.包含祖先信息的犬类基因组学挑战了流行的品种刻板印象。

Science. 2022 Apr 29;376(6592):eabk0639. doi: 10.1126/science.abk0639.

本文引用的文献

Characterization of a haplotype-reference panel for genotyping by low-pass sequencing in Swiss Large White pigs.用于瑞士大白猪低通量测序基因分型的单倍型参考面板的特征分析

BMC Genomics. 2021 Apr 21;22(1):290. doi: 10.1186/s12864-021-07610-5.

Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations.低覆盖度测序能有效地检测到代表性不足的人群中的已知和新的变异。

Am J Hum Genet. 2021 Apr 1;108(4):656-668. doi: 10.1016/j.ajhg.2021.03.012. Epub 2021 Mar 25.

Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics.比较低通量测序和基因分型用于药物遗传学中的性状定位

BMC Genomics. 2021 Mar 20;22(1):197. doi: 10.1186/s12864-021-07508-2.

Efficient phasing and imputation of low-coverage sequencing data using large reference panels.利用大型参考面板实现低覆盖度测序数据的高效相位推断和插补。

Nat Genet. 2021 Jan;53(1):120-126. doi: 10.1038/s41588-020-00756-0. Epub 2021 Jan 7.

Assessment of Imputation from Low-Pass Sequencing to Predict Merit of Beef Steers.低深度测序预测肉牛优秀程度的插补评估。

Genes (Basel). 2020 Nov 5;11(11):1312. doi: 10.3390/genes11111312.

Genetic analysis of the modern Australian labradoodle dog breed reveals an excess of the poodle genome.对现代澳大利亚拉布拉多犬种的遗传分析显示，其拥有过多的贵宾犬基因组。

PLoS Genet. 2020 Sep 10;16(9):e1008956. doi: 10.1371/journal.pgen.1008956. eCollection 2020 Sep.

Association of Common Genetic Variants in the and Genes with Canine Idiopathic Pulmonary Fibrosis in the West Highland White Terrier.与西高地白梗犬特发性肺纤维化相关的和基因常见遗传变异的关联。

Genes (Basel). 2020 May 30;11(6):609. doi: 10.3390/genes11060609.

Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity.基因型填充与参考面板：关于单倍型大小和多样性的系统评估

Brief Bioinform. 2019 Nov 6. doi: 10.1093/bib/bbz108.

NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants.NARD：1779 名东北亚人的全基因组参考面板提高了罕见和低频变异体的推断准确性。

Genome Med. 2019 Oct 22;11(1):64. doi: 10.1186/s13073-019-0677-z.

Genetic dissection of complex behaviour traits in German Shepherd dogs.德国牧羊犬复杂行为特征的遗传剖析。

Heredity (Edinb). 2019 Dec;123(6):746-758. doi: 10.1038/s41437-019-0275-2. Epub 2019 Oct 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于分析犬低深度测序中导入基因型的最佳实践。

Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献