Suppr超能文献

生物样本库规模数据集中有害罕见变异的研究设计与抽样

Study design and the sampling of deleterious rare variants in biobank-scale datasets.

作者信息

Steiner Margaret C, Rice Daniel P, Biddanda Arjun, Ianni-Ravn Mariadaria K, Porras Christian, Novembre John

机构信息

Department of Human Genetics, University of Chicago, Chicago, IL 60637.

Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

出版信息

bioRxiv. 2025 Jan 29:2024.12.02.626424. doi: 10.1101/2024.12.02.626424.

Abstract

One key component of study design in population genetics is the "geographic breadth" of a sample (i.e., how broad a region across which individuals are sampled). How the geographic breadth of a sample impacts observations of rare, deleterious variants is unclear, even though such variants are of particular interest for biomedical and evolutionary applications. Here, in order to gain insight into the effects of sample design on ascertained genetic variants, we formulate a stochastic model of dispersal, genetic drift, selection, mutation, and geographically concentrated sampling. We use this model to understand the effects of the geographic breadth of sampling effort on the discovery of negatively selected variants. We find that samples which are more geographically broad will discover a greater number variants as compared geographically narrow samples (an effect we label "discovery"); though the variants will be detected at lower average frequency than in narrow samples (e.g. as singletons, an effect we label "dilution"). Importantly, these effects are amplified for larger sample sizes and moderated by the magnitude of fitness effects. We validate these results using both population genetic simulations and empirical analyses in the UK Biobank. Our results are particularly important in two contexts: the association of large-effect rare variants with particular phenotypes and the inference of negative selection from allele frequency data. Overall, our findings emphasize the importance of considering geographic breadth when designing and carrying out genetic studies, especially at biobank scale.

摘要

群体遗传学研究设计的一个关键要素是样本的“地理广度”(即个体采样所跨越的区域有多广)。尽管稀有有害变异在生物医学和进化应用中特别受关注,但样本的地理广度如何影响对这些变异的观察尚不清楚。在此,为了深入了解样本设计对已确定的遗传变异的影响,我们构建了一个关于扩散、遗传漂变、选择、突变和地理集中采样的随机模型。我们使用这个模型来理解采样工作的地理广度对负选择变异发现的影响。我们发现,与地理范围狭窄的样本相比,地理范围更广的样本将发现更多的变异(我们将这种效应称为“发现”);不过,这些变异的平均检测频率将低于狭窄样本中的变异(例如作为单例,我们将这种效应称为“稀释”)。重要的是,对于更大的样本量,这些效应会被放大,并且会受到适应度效应大小的调节。我们使用群体遗传模拟和英国生物银行的实证分析来验证这些结果。我们的结果在两种情况下尤为重要:大效应稀有变异与特定表型的关联以及从等位基因频率数据推断负选择。总体而言,我们的研究结果强调了在设计和开展遗传研究时考虑地理广度的重要性,尤其是在生物银行规模的研究中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d908/11781414/31bd7cff043a/nihpp-2024.12.02.626424v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验