• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Characterizing bias in population genetic inferences from low-coverage sequencing data.从低覆盖测序数据中推断群体遗传时的偏差特征分析。
Mol Biol Evol. 2014 Mar;31(3):723-35. doi: 10.1093/molbev/mst229. Epub 2013 Nov 27.
2
Fast and accurate site frequency spectrum estimation from low coverage sequence data.从低覆盖度序列数据中快速准确地估计位点频率谱
Bioinformatics. 2015 Mar 1;31(5):720-7. doi: 10.1093/bioinformatics/btu725. Epub 2014 Oct 30.
3
Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data.无基因型估计等位基因频率可减少偏差并提高 RADSeq 数据的种群遗传推断准确性。
Mol Ecol Resour. 2019 May;19(3):586-596. doi: 10.1111/1755-0998.12990. Epub 2019 Apr 17.
4
Estimation of allele frequency and association mapping using next-generation sequencing data.利用下一代测序数据进行等位基因频率估计和关联作图。
BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.
5
Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics.使用亲缘关系相近的参考基因组对非模式生物进行重测序研究:群体基因组学的最佳实验设计和生物信息学方法
Mol Ecol. 2014 Apr;23(7):1764-79. doi: 10.1111/mec.12693.
6
Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.利用改良的简化代表性测序和 SNP 调用算法的直接比较,生成猩猩群体基因组学的 SNP 数据集。
BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.
7
A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.
8
Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.从低覆盖高通量测序数据中快速准确地估计多维位点频率谱。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac032.
9
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
10
The inference of sex-biased human demography from whole-genome data.从全基因组数据推断人类性别偏向的人口统计学。
PLoS Genet. 2019 Sep 20;15(9):e1008293. doi: 10.1371/journal.pgen.1008293. eCollection 2019 Sep.

引用本文的文献

1
Genome-environment association analysis reveals climate-driven adaptation of chickens.基因组-环境关联分析揭示了气候驱动的鸡的适应性。
Genet Sel Evol. 2025 Jul 22;57(1):43. doi: 10.1186/s12711-025-00989-9.
2
Population Genomics of Giant Mice from the Faroe Islands: Hybridization, Colonization, and a Novel Challenge to Identifying Genomic Targets of Selection.法罗群岛巨型小鼠的群体基因组学:杂交、殖民化以及识别选择的基因组靶点面临的新挑战。
Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf141.
3
Genomic evidence for fisheries-induced evolution in Eastern Baltic cod.东波罗的海鳕鱼渔业诱导进化的基因组证据。
Sci Adv. 2025 Jun 27;11(26):eadr9889. doi: 10.1126/sciadv.adr9889. Epub 2025 Jun 25.
4
Modeling Biases from Low-Pass Genome Sequencing to Enable Accurate Population Genetic Inferences.对低通量基因组测序中的偏差进行建模以实现准确的群体遗传推断。
Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msaf002.
5
Recent Origin of a Range-Restricted Species With Subsequent Introgression in its Widespread Congener in the Phyteuma spicatum Group (Campanulaceae).风铃草属穗花风铃草组中一个分布范围受限物种的近期起源及其随后在其广布近缘种中的渐渗现象(桔梗科)
Mol Ecol. 2025 Feb;34(3):e17624. doi: 10.1111/mec.17624. Epub 2024 Dec 13.
6
The genomic footprint of whaling and isolation in fin whale populations.长须鲸种群中捕鲸和隔离的基因组足迹。
Nat Commun. 2023 Sep 12;14(1):5465. doi: 10.1038/s41467-023-40052-z.
7
The Impact of Sample Size and Population History on Observed Mutational Spectra: A Case Study in Human and Chimpanzee Populations.样本量和群体历史对观测到的突变谱的影响:以人类和黑猩猩群体为例。
Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad019.
8
Using landscape genomics to delineate future adaptive potential for climate change in the Yosemite toad ().利用景观基因组学描绘优胜美地蟾蜍未来对气候变化的适应潜力。
Evol Appl. 2022 Dec 7;16(1):74-97. doi: 10.1111/eva.13511. eCollection 2023 Jan.
9
The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, ).群体基因组研究中测序深度和参考基因组相关性的影响:以两种毛翅目昆虫(毛翅目,石蛾科)为例的研究
Ecol Evol. 2022 Dec 12;12(12):e9583. doi: 10.1002/ece3.9583. eCollection 2022 Dec.
10
Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage.使用随机期望最大化(stochastic EM)从低覆盖测序数据中估计位点频率谱可以减少过拟合、运行时间和内存使用。
Genetics. 2022 Nov 30;222(4). doi: 10.1093/genetics/iyac148.

本文引用的文献

1
Stacks: an analysis tool set for population genomics.Stacks:用于群体基因组学的分析工具集。
Mol Ecol. 2013 Jun;22(11):3124-40. doi: 10.1111/mec.12354. Epub 2013 May 24.
2
The genomic signature of dog domestication reveals adaptation to a starch-rich diet.犬类驯化的基因组特征揭示了其对富含淀粉饮食的适应。
Nature. 2013 Mar 21;495(7441):360-4. doi: 10.1038/nature11837. Epub 2013 Jan 23.
3
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
4
SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.从新一代测序数据中进行 SNP 调用、基因型调用和样本等位基因频率估计。
PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.
5
An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.在 14002 个人中对 202 个药物靶标基因进行测序,发现了大量罕见的功能变异。
Science. 2012 Jul 6;337(6090):100-4. doi: 10.1126/science.1217876. Epub 2012 May 17.
6
The Drosophila melanogaster Genetic Reference Panel.黑腹果蝇遗传参考面板。
Nature. 2012 Feb 8;482(7384):173-8. doi: 10.1038/nature10811.
7
Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity.染色体尺度的选择清除塑造秀丽隐杆线虫基因组多样性。
Nat Genet. 2012 Jan 29;44(3):285-90. doi: 10.1038/ng.1050.
8
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。
Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.
9
Inference of population mutation rate and detection of segregating sites from next-generation sequence data.从下一代测序数据中推断群体突变率和检测分离位点。
Genetics. 2011 Oct;189(2):595-605. doi: 10.1534/genetics.111.130898. Epub 2011 Aug 11.
10
Estimation of allele frequency and association mapping using next-generation sequencing data.利用下一代测序数据进行等位基因频率估计和关联作图。
BMC Bioinformatics. 2011 Jun 11;12:231. doi: 10.1186/1471-2105-12-231.

从低覆盖测序数据中推断群体遗传时的偏差特征分析。

Characterizing bias in population genetic inferences from low-coverage sequencing data.

机构信息

Department of Biostatistics, University of California, Los Angeles.

出版信息

Mol Biol Evol. 2014 Mar;31(3):723-35. doi: 10.1093/molbev/mst229. Epub 2013 Nov 27.

DOI:10.1093/molbev/mst229
PMID:24288159
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3935184/
Abstract

The site frequency spectrum (SFS) is of primary interest in population genetic studies, because the SFS compresses variation data into a simple summary from which many population genetic inferences can proceed. However, inferring the SFS from sequencing data is challenging because genotype calls from sequencing data are often inaccurate due to high error rates and if not accounted for, this genotype uncertainty can lead to serious bias in downstream analysis based on the inferred SFS. Here, we compare two approaches to estimate the SFS from sequencing data: one approach infers individual genotypes from aligned sequencing reads and then estimates the SFS based on the inferred genotypes (call-based approach) and the other approach directly estimates the SFS from aligned sequencing reads by maximum likelihood (direct estimation approach). We find that the SFS estimated by the direct estimation approach is unbiased even at low coverage, whereas the SFS by the call-based approach becomes biased as coverage decreases. The direction of the bias in the call-based approach depends on the pipeline to infer genotypes. Estimating genotypes by pooling individuals in a sample (multisample calling) results in underestimation of the number of rare variants, whereas estimating genotypes in each individual and merging them later (single-sample calling) leads to overestimation of rare variants. We characterize the impact of these biases on downstream analyses, such as demographic parameter estimation and genome-wide selection scans. Our work highlights that depending on the pipeline used to infer the SFS, one can reach different conclusions in population genetic inference with the same data set. Thus, careful attention to the analysis pipeline and SFS estimation procedures is vital for population genetic inferences.

摘要

位点频率谱(SFS)是群体遗传学研究的主要关注点,因为 SFS 将变异数据压缩为一个简单的摘要,从中可以进行许多群体遗传推断。然而,从测序数据推断 SFS 具有挑战性,因为测序数据的基因型调用由于错误率高而往往不准确,如果不加以考虑,这种基因型不确定性会导致基于推断的 SFS 的下游分析中出现严重偏差。在这里,我们比较了两种从测序数据估计 SFS 的方法:一种方法从比对的测序读取中推断个体基因型,然后基于推断的基因型估计 SFS(基于调用的方法),另一种方法直接从比对的测序读取中通过最大似然估计 SFS(直接估计方法)。我们发现,即使在低覆盖率下,直接估计方法估计的 SFS 也是无偏的,而基于调用的方法的 SFS 随着覆盖率的降低变得有偏差。基于调用的方法中的偏差方向取决于推断基因型的管道。通过在样本中汇集个体来估计基因型(多样本调用)会导致稀有变异数量的低估,而在每个个体中估计基因型并稍后合并它们(单样本调用)会导致稀有变异的高估。我们描述了这些偏差对下游分析的影响,例如人口参数估计和全基因组选择扫描。我们的工作强调,根据用于推断 SFS 的管道,即使使用相同的数据集,在群体遗传推断中也可以得出不同的结论。因此,对分析管道和 SFS 估计程序的仔细关注对于群体遗传推断至关重要。