Ward Eric J, Waples Robin S
Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd. East, Seattle, WA 98112, USA.
School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195, USA.
Entropy (Basel). 2024 Sep 21;26(9):805. doi: 10.3390/e26090805.
Generating vast arrays of genetic markers for evolutionary ecology studies has become routine and cost-effective. However, analyzing data from large numbers of loci associated with a small number of finite chromosomes introduces a challenge: loci on the same chromosome do not assort independently, leading to pseudoreplication. Previous studies have demonstrated that pseudoreplication can substantially reduce precision of genetic analyses (and make confidence intervals wider), such as and linkage disequilibrium (LD) measures between pairs of loci. In LD analyses, another type of dependency (overlapping pairs of the same loci) also creates pseudoreplication. Building on previous work, we explore the potential of entropy metrics to improve the status quo, particularly total correlation (TC), to assess pseudoreplication in LD studies. Our simulations, performed on a monoecious population with a range of effective population sizes () and numbers of loci, attempted to isolate the overlapping-pairs-of-loci effect by considering unlinked loci and using entropy to quantify inter-locus relationships. We hypothesized a positive correlation between TC and the number of loci (L), and a negative correlation between TC and . Results from our statistical models predicting TC demonstrate a strong effect of the number of loci, and muted effects of and other predictors, adding support to the use of entropy-based metrics as a tool for estimating the statistical information of complex genetic datasets. Our results also highlight a challenge regarding scalability; computational limitations arise as the number of loci grows, making our current approach limited to smaller datasets. Despite these challenges, this work further refines our understanding of entropy measures, and offers insights into the complex dynamics of genetic information in evolutionary ecology research.
为进化生态学研究生成大量的遗传标记已变得常规且具有成本效益。然而,分析来自与少数有限染色体相关的大量基因座的数据带来了一个挑战:同一染色体上的基因座并非独立分配,从而导致伪重复。先前的研究表明,伪重复会大幅降低遗传分析的精度(并使置信区间变宽),例如基因座对之间的连锁不平衡(LD)测量。在LD分析中,另一种类型的依赖性(相同基因座的重叠对)也会产生伪重复。基于先前的工作,我们探索熵度量的潜力以改善现状,特别是总相关性(TC),以评估LD研究中的伪重复。我们在具有一系列有效种群大小()和基因座数量的雌雄同体种群上进行模拟,试图通过考虑不连锁的基因座并使用熵来量化基因座间关系来分离基因座重叠对效应。我们假设TC与基因座数量(L)之间存在正相关,与之间存在负相关。我们预测TC的统计模型结果表明基因座数量有很强的影响,而和其他预测因子的影响较弱,这支持了使用基于熵的度量作为估计复杂遗传数据集统计信息的工具。我们的结果还突出了一个关于可扩展性的挑战;随着基因座数量的增加会出现计算限制,使得我们目前的方法仅限于较小的数据集。尽管存在这些挑战,这项工作进一步完善了我们对熵度量的理解,并为进化生态学研究中遗传信息的复杂动态提供了见解。