Rau Christoph D, Bradley Patrick H
Department of Genetics and Computational Medicine Program, University of North Carolina at Chapel Hill.
Department of Microbiology, The Ohio State University.
bioRxiv. 2025 Jun 3:2025.05.31.657208. doi: 10.1101/2025.05.31.657208.
Quantitative genetics methods can be particularly powerful in model organisms and non-human populations, and we now have strain collections such as recombinant inbred lines, etc. that can be phenotyped. Natural diversity is also valuable in non-model systems that do not yet have reverse genetic tools. However, purchasing and phenotyping large collections can be cost-prohibitive. Strain or sample acquisition costs may also vary dramatically for different strains or isolates. Thus, investigators need efficient strategies to optimize experimental power for a given limited budget. In this study, we evaluate several approaches to optimally select subsets of the total cohort to best maintain power when performing genome-wide association studies. Some approaches focus solely on costs, others on genetic diversity, and some on both simultaneously. Through simulation studies across different minor allele frequencies and SNP effect sizes, we demonstrate that selecting for cost is most beneficial at low-to-moderate budget thresholds, while selecting for diversity is optimal in scenarios involving rare (MAF 5-10%) variants or higher total costs, or when accounting for additional costs per strain studied. We also evaluate these approaches on data from the Hybrid Mouse Diversity Panel (HMDP), and find that an approach that considers both cost and diversity is superior at recovering significant loci and maintaining statistical power under real-world conditions. This approach picks the strains that, for a given budget, minimize the total genetic distance to the strains that were not selected. This approach, which we term "ThriftyMD" (for "Thrifty Minimum Distance"), extends previous distance-based methods to pick a representative panel by explicitly adding a cost constraint. Overall, our results highlight the trade-offs between cost, diversity, and power in GWAS cohort design, and present the ThriftyMD algorithm as a versatile and robust approach for optimizing study design in resource-limited settings.
数量遗传学方法在模式生物和非人类群体中可能特别强大,并且我们现在拥有诸如重组近交系等可以进行表型分析的品系集合。自然多样性在尚未具备反向遗传学工具的非模式系统中也很有价值。然而,购买和对大量集合进行表型分析可能成本过高。不同品系或分离株的品系或样本获取成本也可能有很大差异。因此,研究人员需要有效的策略来在给定的有限预算下优化实验效能。在本研究中,我们评估了几种方法,以便在进行全基因组关联研究时最佳地选择总队列的子集,以最好地保持效能。一些方法仅关注成本,另一些关注遗传多样性,还有一些同时关注两者。通过在不同的次要等位基因频率和单核苷酸多态性效应大小上进行模拟研究,我们证明在低至中等预算阈值下选择成本最为有益,而在涉及罕见(次要等位基因频率为5 - 10%)变异或更高总成本的情况下,或者在考虑每个研究品系的额外成本时,选择多样性是最优的。我们还在来自杂交小鼠多样性面板(HMDP)的数据上评估了这些方法,发现在实际条件下,一种同时考虑成本和多样性的方法在恢复显著位点和保持统计效能方面更优越。这种方法在给定预算下选择那些与未选择的品系的总遗传距离最小的品系。我们将这种方法称为“节俭最小距离法”(“Thrifty Minimum Distance”,简称“ThriftyMD”),它通过明确添加成本约束,扩展了以前基于距离的方法来选择代表性面板。总体而言,我们的结果突出了全基因组关联研究队列设计中成本、多样性和效能之间的权衡,并提出了节俭最小距离法算法作为在资源有限环境中优化研究设计的一种通用且稳健的方法。