• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组规模数据集的伪复制。

Pseudoreplication in genomic-scale data sets.

机构信息

NOAA Fisheries, Northwest Fisheries Science Center, Seattle, WA, USA.

Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.

出版信息

Mol Ecol Resour. 2022 Feb;22(2):503-518. doi: 10.1111/1755-0998.13482. Epub 2021 Sep 7.

DOI:10.1111/1755-0998.13482
PMID:34351073
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9415146/
Abstract

In genomic-scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here, we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (F ) and a common measure of linkage disequilibrium between pairs of loci (r ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean F and mean r as more loci were used. For both indices, df' increases with N and genome size, as expected. However, even for large N and large genomes, df' for mean r plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for F , but df'/df ≤0.01 can occur in data sets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var (F ), producing very conservative confidence intervals. Predicting df' based on our modelling results as a function of N , L, S, and genome size provides a robust way to quantify precision associated with genomic-scale data sets.

摘要

在基因组规模的数据集中,基因座在染色体内部紧密聚集,因此提供了相关信息。如果将基因座视为独立的,对其进行平均处理会产生伪复制,从而与名义自由度(df)相比,有效自由度(df')减少。这个问题已经存在一段时间了,但尚未系统地在整个基因组范围内量化其后果。在这里,我们针对遗传分化的常用度量标准(F)和两个基因座之间连锁不平衡的常用度量标准(r),测量了伪复制(通过 df'/df 的比值来量化)。基于使用 SLiM 和 msprime 模型模拟的数据,这些模型允许高效的正向时间和合并模拟,同时精确控制群体血统,我们通过测量随着更多基因座的使用,平均 F 和平均 r 的方差下降速度来估计 df'和 df'/df。对于这两个指标,df'随着 N 和基因组大小的增加而增加,这是预期的。然而,即使对于大 N 和大基因组,平均 r 的 df'在几千个基因座之后趋于平稳,并且方差分量分析表明,限制因素是与抽样个体而不是基因相关的不确定性。对于 F ,伪复制的程度不那么极端,但在使用数万基因座的数据集上,df'/df ≤0.01 可能会发生。常用的块 jackknife 方法始终高估了 var(F),产生了非常保守的置信区间。根据我们的建模结果,将 df'作为 N、L、S 和基因组大小的函数进行预测,可以为量化基因组规模数据集的精度提供一种稳健的方法。

相似文献

1
Pseudoreplication in genomic-scale data sets.基因组规模数据集的伪复制。
Mol Ecol Resour. 2022 Feb;22(2):503-518. doi: 10.1111/1755-0998.13482. Epub 2021 Sep 7.
2
Estimating contemporary effective population size in non-model species using linkage disequilibrium across thousands of loci.利用数千个基因座间的连锁不平衡估计非模式物种的当代有效种群大小。
Heredity (Edinb). 2016 Oct;117(4):233-40. doi: 10.1038/hdy.2016.60. Epub 2016 Aug 24.
3
Potential Benefits and Challenges of Quantifying Pseudoreplication in Genomic Data with Entropy Statistics.用熵统计量化基因组数据中伪重复的潜在益处与挑战。
Entropy (Basel). 2024 Sep 21;26(9):805. doi: 10.3390/e26090805.
4
Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees.在基因组学时代测量个体近亲繁殖:基于标记的测量方法优于系谱法。
Heredity (Edinb). 2015 Jul;115(1):63-72. doi: 10.1038/hdy.2015.17. Epub 2015 Mar 18.
5
speed-ne: Software to simulate and estimate genetic effective population size (N ) from linkage disequilibrium observed in single samples.speed-ne:一款软件,用于从单一样本中观察到的连锁不平衡来模拟和估计遗传有效群体大小 (N )。
Mol Ecol Resour. 2018 May;18(3):714-728. doi: 10.1111/1755-0998.12759. Epub 2018 Feb 21.
6
Relative Precision of the Sibship and LD Methods for Estimating Effective Population Size With Genomics-Scale Datasets.基于基因组规模数据集的亲缘系数和 LD 方法估计有效种群大小的相对精度。
J Hered. 2021 Nov 1;112(6):535-539. doi: 10.1093/jhered/esab042.
7
Practical application of the linkage disequilibrium method for estimating contemporary effective population size: A review.连锁不平衡方法在估计当代有效种群大小中的实际应用:综述。
Mol Ecol Resour. 2024 Jan;24(1):e13879. doi: 10.1111/1755-0998.13879. Epub 2023 Oct 24.
8
Effect of genomic selection on rate of inbreeding and coancestry and effective population size of Holstein and Jersey cattle populations.基因组选择对荷斯坦和泽西牛群体近交率、亲缘关系和有效群体大小的影响。
J Dairy Sci. 2020 Jun;103(6):5183-5199. doi: 10.3168/jds.2019-18013. Epub 2020 Apr 8.
9
The genomic consequences of adaptive divergence and reproductive isolation between species of manakins.雌雄双色翠雀的适应分歧和生殖隔离的基因组后果。
Mol Ecol. 2013 Jun;22(12):3304-17. doi: 10.1111/mec.12201. Epub 2013 Feb 26.
10
A method for detecting recent changes in contemporary effective population size from linkage disequilibrium at linked and unlinked loci.一种通过连锁和非连锁位点的连锁不平衡来检测当代有效种群大小近期变化的方法。
Heredity (Edinb). 2016 Oct;117(4):207-16. doi: 10.1038/hdy.2016.30. Epub 2016 May 11.

引用本文的文献

1
Effective Population Size Estimation in Large Marine Populations: Considering Current Challenges and Opportunities When Simulating Large Data Sets With High-Density Genomic Information.大型海洋种群有效种群大小的估计:在利用高密度基因组信息模拟大型数据集时考虑当前的挑战与机遇
Evol Appl. 2025 Jul 28;18(8):e70121. doi: 10.1111/eva.70121. eCollection 2025 Aug.
2
Estimating Recent and Historical Effective Population Size of Marine and Freshwater Sticklebacks.估算海洋和淡水棘鱼的近期及历史有效种群大小
Mol Ecol. 2025 Jul;34(13):e17825. doi: 10.1111/mec.17825. Epub 2025 Jun 8.
3
Maximising the Potential of Temporal Estimation in Long-Term Population Monitoring Programmes.

本文引用的文献

1
Accounting for long-range correlations in genome-wide simulations of large cohorts.在大型队列的全基因组模拟中考虑长程相关性。
PLoS Genet. 2020 May 5;16(5):e1008619. doi: 10.1371/journal.pgen.1008619. eCollection 2020 May.
2
Sex Differences in the Recombination Landscape.性别在重组景观中的差异。
Am Nat. 2020 Feb;195(2):361-379. doi: 10.1086/704943. Epub 2019 Dec 9.
3
scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.scMerge 通过因子分析、稳定表达和伪复制来合并多个单细胞 RNA-seq 数据集。
在长期人口监测项目中最大化时间估计的潜力
Mol Ecol Resour. 2025 Oct;25(7):e14125. doi: 10.1111/1755-0998.14125. Epub 2025 May 23.
4
MaxTemp: A Method to Maximise Precision of the Temporal Method for Estimating N in Genetic Monitoring Programs.最高温度法:一种在遗传监测项目中最大化时间法估算N精度的方法。
Mol Ecol Resour. 2025 Oct;25(7):e14057. doi: 10.1111/1755-0998.14057. Epub 2025 Jan 7.
5
The Once and Future Fish: Assessing a Millennium of Atlantic Herring Exploitation Through Mixed-Stock Analysis and Ancient DNA.《往昔与未来之鱼:通过混合种群分析和古DNA评估千年大西洋鲱鱼捕捞情况》
Glob Chang Biol. 2024 Dec;30(12):e70010. doi: 10.1111/gcb.70010.
6
Levels and Spatial Patterns of Effective Population Sizes in the Southern Damselfly (): On the Need to Carefully Interpret Single-Point and Temporal Estimations to Set Conservation Guidelines.南方豆娘有效种群大小的水平及空间格局:关于在制定保护准则时需谨慎解读单点和时间估计值的必要性
Evol Appl. 2024 Dec 24;17(12):e70062. doi: 10.1111/eva.70062. eCollection 2024 Dec.
7
Unraveling the Complexity of the / Ratio for Conservation of Large and Widespread Pelagic Fish Species: Current Status and Challenges.解读大型广布性远洋鱼类物种保护的/比率的复杂性:现状与挑战
Evol Appl. 2024 Oct 10;17(10):e70020. doi: 10.1111/eva.70020. eCollection 2024 Oct.
8
Potential Benefits and Challenges of Quantifying Pseudoreplication in Genomic Data with Entropy Statistics.用熵统计量化基因组数据中伪重复的潜在益处与挑战。
Entropy (Basel). 2024 Sep 21;26(9):805. doi: 10.3390/e26090805.
9
The / ratio in applied conservation.应用保护中的/比率。 (你提供的原文表述似乎不太完整准确,可能影响翻译的精准度。)
Evol Appl. 2024 May 8;17(5):e13695. doi: 10.1111/eva.13695. eCollection 2024 May.
10
Estimation of contemporary effective population size in plant populations: Limitations of genomic datasets.植物种群当代有效种群大小的估计:基因组数据集的局限性
Evol Appl. 2024 May 3;17(5):e13691. doi: 10.1111/eva.13691. eCollection 2024 May.
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784. doi: 10.1073/pnas.1820006116. Epub 2019 Apr 26.
4
Evaluation of a Chicken 600K SNP genotyping array in non-model species of grouse.评估鸡 600K SNP 基因分型芯片在松鸡目非模式物种中的应用。
Sci Rep. 2019 Apr 23;9(1):6407. doi: 10.1038/s41598-019-42885-5.
5
An empirical approach to demographic inference with genomic data.一种利用基因组数据进行人口统计学推断的实证方法。
Theor Popul Biol. 2019 Jun;127:91-101. doi: 10.1016/j.tpb.2019.03.005. Epub 2019 Apr 9.
6
Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton .迈向大基因组非模式物种的群体基因组学:以海洋浮游动物为例
R Soc Open Sci. 2019 Feb 13;6(2):180608. doi: 10.1098/rsos.180608. eCollection 2019 Feb.
7
A rigorous measure of genome-wide genetic shuffling that takes into account crossover positions and Mendel's second law.一种严格的全基因组遗传混合度量方法,考虑了交叉位置和孟德尔第二定律。
Proc Natl Acad Sci U S A. 2019 Jan 29;116(5):1659-1668. doi: 10.1073/pnas.1817482116. Epub 2019 Jan 11.
8
Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes.SLiM 中的树序列记录为全基因组的正向时间模拟开辟了新的视野。
Mol Ecol Resour. 2019 Mar;19(2):552-566. doi: 10.1111/1755-0998.12968. Epub 2019 Feb 21.
9
Using Biological Insight and Pragmatism When Thinking about Pseudoreplication.运用生物学洞察力和实用主义思考拟似重复。
Trends Ecol Evol. 2018 Jan;33(1):28-35. doi: 10.1016/j.tree.2017.10.007. Epub 2017 Nov 6.
10
ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.估计用于群体结构分析的F统计量
Evolution. 1984 Nov;38(6):1358-1370. doi: 10.1111/j.1558-5646.1984.tb05657.x.