从 NGS 数据估算近交系数：对基因型调用和等位基因频率估计的影响。

Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation.

机构信息

Department of Integrative Biology, University of California, Berkeley, Berkeley, California 94720, USA;

出版信息

Genome Res. 2013 Nov;23(11):1852-61. doi: 10.1101/gr.157388.113. Epub 2013 Aug 15.

DOI:10.1101/gr.157388.113

PMID:23950147

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3814885/

Abstract

Most methods for next-generation sequencing (NGS) data analyses incorporate information regarding allele frequencies using the assumption of Hardy-Weinberg equilibrium (HWE) as a prior. However, many organisms including those that are domesticated, partially selfing, or with asexual life cycles show strong deviations from HWE. For such species, and specially for low-coverage data, it is necessary to obtain estimates of inbreeding coefficients (F) for each individual before calling genotypes. Here, we present two methods for estimating inbreeding coefficients from NGS data based on an expectation-maximization (EM) algorithm. We assess the impact of taking inbreeding into account when calling genotypes or estimating the site frequency spectrum (SFS), and demonstrate a marked increase in accuracy on low-coverage highly inbred samples. We demonstrate the applicability and efficacy of these methods in both simulated and real data sets.

摘要

大多数下一代测序（NGS）数据分析方法都利用哈迪-温伯格平衡（HWE）的假设作为先验信息来整合等位基因频率的信息。然而，包括那些经过驯化的、部分自交的或具有无性生殖周期的生物体在内，它们都显示出与 HWE 有很大的偏离。对于这些物种，特别是对于低覆盖率的数据，在调用基因型之前，有必要为每个个体获得近交系数（F）的估计值。在这里，我们提出了两种基于期望最大化（EM）算法从 NGS 数据中估计近交系数的方法。我们评估了在调用基因型或估计位点频率谱（SFS）时考虑近交的影响，并在低覆盖率高度近交的样本中显示出显著的准确性提高。我们在模拟和真实数据集上展示了这些方法的适用性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4f1/3814885/bb0d46b6197d/1852fig1.jpg

相似文献

Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation.

Genome Res. 2013 Nov;23(11):1852-61. doi: 10.1101/gr.157388.113. Epub 2013 Aug 15.

SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.

PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.

Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium.

Bioinformatics. 2009 Feb 1;25(3):379-86. doi: 10.1093/bioinformatics/btn623. Epub 2008 Dec 2.

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.

Maximum likelihood estimation of individual inbreeding coefficients and null allele frequencies.

Genet Res (Camb). 2012 Jun;94(3):151-61. doi: 10.1017/S0016672312000341. Epub 2012 Jul 18.

Estimation of allele frequency and association mapping using next-generation sequencing data.

BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.

Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.

Am J Hum Genet. 2000 Oct;67(4):947-59. doi: 10.1086/303069. Epub 2000 Aug 22.

Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds.

BMC Genet. 2015 Jul 22;16:88. doi: 10.1186/s12863-015-0227-7.

Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping.

Mol Ecol. 2013 Jul;22(14):3766-79. doi: 10.1111/mec.12360. Epub 2013 Jun 4.

A maximum-likelihood method to correct for allelic dropout in microsatellite data with no replicate genotypes.

Genetics. 2012 Oct;192(2):651-69. doi: 10.1534/genetics.112.139519. Epub 2012 Jul 30.

引用本文的文献

Urban Life Shapes Genetic Diversity in the Green Anole, Anolis carolinensis.

Mol Ecol. 2025 Sep;34(18):e70057. doi: 10.1111/mec.70057. Epub 2025 Jul 29.

Time-lagged genomic erosion and future environmental risks in a bird on the brink of extinction.

Proc Biol Sci. 2025 Mar;292(2043):20242480. doi: 10.1098/rspb.2024.2480. Epub 2025 Mar 26.

Translocations spur population growth but fail to prevent genetic erosion in imperiled Florida Scrub-Jays.

Curr Biol. 2025 Mar 24;35(6):1391-1399.e6. doi: 10.1016/j.cub.2025.01.058. Epub 2025 Feb 27.

Modeling Biases from Low-Pass Genome Sequencing to Enable Accurate Population Genetic Inferences.

Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msaf002.

Evidence for gene flow from the Gulf of Mexico to the Atlantic Ocean in bonnethead sharks ().

Ecol Evol. 2024 Sep 22;14(9):e70334. doi: 10.1002/ece3.70334. eCollection 2024 Sep.

Modeling biases from low-pass genome sequencing to enable accurate population genetic inferences.

bioRxiv. 2024 Jul 23:2024.07.19.604366. doi: 10.1101/2024.07.19.604366.

Cleave and Rescue gamete killers create conditions for gene drive in plants.

Nat Plants. 2024 Jun;10(6):936-953. doi: 10.1038/s41477-024-01701-3. Epub 2024 Jun 17.

Signatures of adaptation at key insecticide resistance loci in Anopheles gambiae in Southern Ghana revealed by reduced-coverage WGS.

Sci Rep. 2024 Apr 15;14(1):8650. doi: 10.1038/s41598-024-58906-x.

Genomic signatures of climate adaptation in bank voles.

Ecol Evol. 2024 Mar 7;14(3):e10886. doi: 10.1002/ece3.10886. eCollection 2024 Mar.

Genetic and environmental drivers of migratory behavior in western burrowing owls and implications for conservation and management.

Evol Appl. 2023 Nov 15;16(12):1889-1900. doi: 10.1111/eva.13600. eCollection 2023 Dec.

本文引用的文献

Estimation of the outcrossing rate for annual Asian wild rice under field conditions.

Breed Sci. 2012 Sep;62(3):256-62. doi: 10.1270/jsbbs.62.256. Epub 2012 Nov 1.

Domestication and geographic origin of Oryza sativa in China: insights from multilocus analysis of nucleotide variation of O. sativa and O. rufipogon.

Mol Ecol. 2012 Oct;21(20):5073-87. doi: 10.1111/j.1365-294X.2012.05748.x. Epub 2012 Sep 18.

SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.

PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.

Maximum likelihood estimation of individual inbreeding coefficients and null allele frequencies.

Genet Res (Camb). 2012 Jun;94(3):151-61. doi: 10.1017/S0016672312000341. Epub 2012 Jul 18.

Phylogeography of Asian wild rice, Oryza rufipogon: a genome-wide view.

Mol Ecol. 2012 Sep;21(18):4593-604. doi: 10.1111/j.1365-294X.2012.05625.x. Epub 2012 May 30.

ART: a next-generation sequencing read simulator.

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes.

Nat Biotechnol. 2011 Dec 11;30(1):105-11. doi: 10.1038/nbt.2050.

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.

Genome Biol. 2011 Nov 8;12(11):R112. doi: 10.1186/gb-2011-12-11-r112.

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

Estimation of allele frequency and association mapping using next-generation sequencing data.

BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从 NGS 数据估算近交系数：对基因型调用和等位基因频率估计的影响。

Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献