突变适应度效应分布的推断受到单核苷酸多态性过滤方法、样本量和群体结构的影响。

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure.

机构信息

Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden.

Department of Computational Biology, Cornell University, Ithaca, New York, USA.

出版信息

Mol Ecol Resour. 2023 Oct;23(7):1589-1603. doi: 10.1111/1755-0998.13825. Epub 2023 Jun 20.

DOI:10.1111/1755-0998.13825

PMID:37340611

Abstract

The distribution of fitness effects (DFE) of new mutations has been of interest to evolutionary biologists since the concept of mutations arose. Modern population genomic data enable us to quantify the DFE empirically, but few studies have examined how data processing, sample size and cryptic population structure might affect the accuracy of DFE inference. We used simulated and empirical data (from Arabidopsis lyrata) to show the effects of missing data filtering, sample size, number of single nucleotide polymorphisms (SNPs) and population structure on the accuracy and variance of DFE estimates. Our analyses focus on three filtering methods-downsampling, imputation and subsampling-with sample sizes of 4-100 individuals. We show that (1) the choice of missing-data treatment directly affects the estimated DFE, with downsampling performing better than imputation and subsampling; (2) the estimated DFE is less reliable in small samples (<8 individuals), and becomes unpredictable with too few SNPs (<5000, the sum of 0- and 4-fold SNPs); and (3) population structure may skew the inferred DFE towards more strongly deleterious mutations. We suggest that future studies should consider downsampling for small data sets, and use samples larger than 4 (ideally larger than 8) individuals, with more than 5000 SNPs in order to improve the robustness of DFE inference and enable comparative analyses.

摘要

新突变的适应度效应（DFE）分布一直是进化生物学家感兴趣的问题，自从突变的概念出现以来。现代群体基因组数据使我们能够从经验上量化 DFE，但很少有研究探讨数据处理、样本量和隐性群体结构如何影响 DFE 推断的准确性。我们使用模拟和实证数据（来自拟南芥）来展示缺失数据过滤、样本量、单核苷酸多态性（SNP）数量和群体结构对 DFE 估计的准确性和方差的影响。我们的分析集中在三种过滤方法——降采样、插补和抽样——以及 4-100 个个体的样本量。我们表明：（1）缺失数据处理的选择直接影响估计的 DFE，降采样比插补和抽样效果更好；（2）在小样本（<8 个个体）中，估计的 DFE 不太可靠，而 SNP 数量太少（<5000，0 倍和 4 倍 SNP 的总和）则变得不可预测；（3）群体结构可能会使推断的 DFE 偏向于更有害的突变。我们建议，未来的研究应该考虑对小数据集进行降采样，并使用大于 4（理想情况下大于 8）个个体、大于 5000 个 SNP 的样本，以提高 DFE 推断的稳健性并能够进行比较分析。

相似文献

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure.

Mol Ecol Resour. 2023 Oct;23(7):1589-1603. doi: 10.1111/1755-0998.13825. Epub 2023 Jun 20.

Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples.

Genetics. 2017 May;206(1):345-361. doi: 10.1534/genetics.116.197145. Epub 2017 Mar 1.

The distribution of mutational effects on fitness in Caenorhabditis elegans inferred from standing genetic variation.

Genetics. 2022 Jan 4;220(1). doi: 10.1093/genetics/iyab166.

Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies.

Genetics. 2007 Dec;177(4):2251-61. doi: 10.1534/genetics.107.080663.

Effects of new mutations on fitness: insights from models and data.

Ann N Y Acad Sci. 2014 Jul;1320(1):76-92. doi: 10.1111/nyas.12460. Epub 2014 May 30.

polyDFE: Inferring the Distribution of Fitness Effects and Properties of Beneficial Mutations from Polymorphism Data.

Methods Mol Biol. 2020;2090:125-146. doi: 10.1007/978-1-0716-0199-0_6.

New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs.

Mol Biol Evol. 2018 Jun 1;35(6):1536-1546. doi: 10.1093/molbev/msy054.

Inferring Genome-Wide Correlations of Mutation Fitness Effects between Populations.

Mol Biol Evol. 2021 Sep 27;38(10):4588-4602. doi: 10.1093/molbev/msab162.

Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data.

Genetics. 2017 Nov;207(3):1103-1119. doi: 10.1534/genetics.117.300323. Epub 2017 Sep 25.

Determining the factors driving selective effects of new nonsynonymous mutations.

Proc Natl Acad Sci U S A. 2017 Apr 25;114(17):4465-4470. doi: 10.1073/pnas.1619508114. Epub 2017 Apr 11.

引用本文的文献

Demographic History, Genetic Load, and the Efficacy of Selection in the Globally Invasive Mosquito Aedes aegypti.

Genome Biol Evol. 2025 Apr 3;17(4). doi: 10.1093/gbe/evaf066.

Next-generation data filtering in the genomics era.

Nat Rev Genet. 2024 Nov;25(11):750-767. doi: 10.1038/s41576-024-00738-6. Epub 2024 Jun 14.

Demographic history and the efficacy of selection in the globally invasive mosquito .

bioRxiv. 2024 Mar 12:2024.03.07.584008. doi: 10.1101/2024.03.07.584008.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

突变适应度效应分布的推断受到单核苷酸多态性过滤方法、样本量和群体结构的影响。

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure.

机构信息

Department of Ecology and Environmental Sciences, Umeå University, Umeå, Sweden.

Department of Computational Biology, Cornell University, Ithaca, New York, USA.

出版信息

Mol Ecol Resour. 2023 Oct;23(7):1589-1603. doi: 10.1111/1755-0998.13825. Epub 2023 Jun 20.

DOI:10.1111/1755-0998.13825

PMID:37340611

Abstract

摘要

突变适应度效应分布的推断受到单核苷酸多态性过滤方法、样本量和群体结构的影响。

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

突变适应度效应分布的推断受到单核苷酸多态性过滤方法、样本量和群体结构的影响。

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure.

机构信息

出版信息

相似文献

引用本文的文献