Delord Chrystelle, Arnaud-Haond Sophie, Leone Agostino, Noskova Ekaterina, Tournebize Rémi, Jacques Patrick, Marsac Francis, Nikolic Natacha
MARBEC, Univ Montpellier, CNRS, Ifremer, IRD La Réunion France.
MARBEC, Univ Montpellier, CNRS, Ifremer, IRD Sète France.
Evol Appl. 2025 Jul 28;18(8):e70121. doi: 10.1111/eva.70121. eCollection 2025 Aug.
Next-generation-sequencing has broadened perspectives regarding the estimation of the effective population size () by providing high-density genomic information. These technologies have expanded data collection and analytical tools in population genetics, increasing understanding of populations with high abundance, such as marine species with high commercial or conservation priority. Several common methods for estimating are based on allele frequency spectra or linkage disequilibrium between loci. However, their specific constraints make it difficult to apply them to large populations, especially with confounding factors such as migration rates, complex sampling schemes or non-independence between loci. Computer simulations have long represented invaluable tools to explore the influence of biological or logistical factors on estimation and to assess the robustness of dedicated methods. Here, we outline several estimation methods and their foundational principles, requirements and likely caveats regarding application to populations of high abundance. Thereafter, we present a simulation framework built upon recent computational genomic tools that combine the possibility to generate biologically realistic data sets with realistic patterns of long-term neutral genetic diversity. This framework aims at reproducing and tracking the main critical features of data derived from a large natural population when running a simulation-based population genetics study, for example, evaluating the strengths and limitations of various estimation methods. We illustrate this framework by generating genotype data sets with varying sample sizes and locus numbers and analysing them with three software tools (NeEstimator2, GONE and GADMA). Detailed and annotated simulation scripts are provided to ensure reproducibility and to support future research on estimation. These resources can support method comparisons and validations, particularly for non-specialists, such as conservation practitioners and students.
新一代测序技术通过提供高密度基因组信息,拓宽了关于有效种群大小()估计的视野。这些技术扩展了群体遗传学中的数据收集和分析工具,增进了对高丰度种群的理解,比如具有高商业或保护优先级的海洋物种。几种常见的估计方法是基于等位基因频率谱或基因座间的连锁不平衡。然而,它们的特定限制使得难以将其应用于大型种群,尤其是存在诸如迁移率、复杂抽样方案或基因座间非独立性等混杂因素的情况。长期以来,计算机模拟一直是探索生物学或后勤因素对估计的影响以及评估专用方法稳健性的宝贵工具。在此,我们概述了几种估计方法及其基本原理、要求以及在应用于高丰度种群时可能存在的问题。此后,我们提出了一个基于近期计算基因组工具构建的模拟框架,该框架结合了生成具有现实长期中性遗传多样性模式的生物学现实数据集的可能性。这个框架旨在在进行基于模拟的群体遗传学研究时,重现和追踪来自大型自然种群的数据的主要关键特征,例如评估各种估计方法的优缺点。我们通过生成具有不同样本大小和基因座数量的基因型数据集并用三种软件工具(NeEstimator2、GONE和GADMA)进行分析来说明这个框架。提供了详细且带注释的模拟脚本以确保可重复性,并支持未来关于估计的研究。这些资源可支持方法比较和验证,特别是对于非专业人员,如保护从业者和学生。