检测近期强烈正选择的方法与工具综述。

A survey of methods and tools to detect recent and strong positive selection.

作者信息

Pavlidis Pavlos, Alachiotis Nikolaos

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas, 70013 Crete, Greece.

出版信息

J Biol Res (Thessalon). 2017 Apr 8;24:7. doi: 10.1186/s40709-017-0064-0. eCollection 2017 Dec.

DOI:10.1186/s40709-017-0064-0

PMID:28405579

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5385031/

Abstract

Positive selection occurs when an allele is favored by natural selection. The frequency of the favored allele increases in the population and due to genetic hitchhiking the neighboring linked variation diminishes, creating so-called selective sweeps. Detecting traces of positive selection in genomes is achieved by searching for signatures introduced by selective sweeps, such as regions of reduced variation, a specific shift of the site frequency spectrum, and particular LD patterns in the region. A variety of methods and tools can be used for detecting sweeps, ranging from simple implementations that compute summary statistics such as Tajima's D, to more advanced statistical approaches that use combinations of statistics, maximum likelihood, machine learning etc. In this survey, we present and discuss summary statistics and software tools, and classify them based on the selective sweep signature they detect, i.e., SFS-based vs. LD-based, as well as their capacity to analyze whole genomes or just subgenomic regions. Additionally, we summarize the results of comparisons among four open-source software releases (SweeD, SweepFinder, SweepFinder2, and OmegaPlus) regarding sensitivity, specificity, and execution times. In equilibrium neutral models or mild bottlenecks, both SFS- and LD-based methods are able to detect selective sweeps accurately. Methods and tools that rely on LD exhibit higher true positive rates than SFS-based ones under the model of a single sweep or recurrent hitchhiking. However, their false positive rate is elevated when a misspecified demographic model is used to represent the null hypothesis. When the correct (or similar to the correct) demographic model is used instead, the false positive rates are considerably reduced. The accuracy of detecting the true target of selection is decreased in bottleneck scenarios. In terms of execution time, LD-based methods are typically faster than SFS-based methods, due to the nature of required arithmetic.

摘要

当一个等位基因受到自然选择的青睐时，就会发生正向选择。受青睐的等位基因在种群中的频率会增加，并且由于遗传搭便车效应，相邻的连锁变异会减少，从而产生所谓的选择性清除。通过搜索选择性清除引入的特征，如变异减少的区域、位点频率谱的特定偏移以及该区域特定的连锁不平衡模式，来检测基因组中的正向选择痕迹。可以使用多种方法和工具来检测选择性清除，从计算诸如 Tajima's D 等汇总统计量的简单实现，到使用统计量组合、最大似然法、机器学习等的更先进统计方法。在本次综述中，我们展示并讨论汇总统计量和软件工具，并根据它们检测到的选择性清除特征对其进行分类，即基于位点频率谱（SFS）的方法与基于连锁不平衡（LD）的方法，以及它们分析全基因组或仅亚基因组区域的能力。此外，我们总结了四个开源软件版本（SweeD、SweepFinder、SweepFinder2 和 OmegaPlus）在灵敏度、特异性和执行时间方面的比较结果。在平衡中性模型或轻度瓶颈情况下，基于 SFS 和基于 LD 的方法都能够准确检测选择性清除。在单次清除或反复搭便车模型下，依赖 LD 的方法比基于 SFS 的方法具有更高的真阳性率。然而，当使用错误指定的群体模型来表示零假设时，它们的假阳性率会升高。当使用正确（或与正确相似）的群体模型时，假阳性率会大幅降低。在瓶颈情况下，检测选择真实目标的准确性会降低。在执行时间方面，由于所需算法的性质，基于 LD 的方法通常比基于 SFS 的方法更快。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32b0/5385031/215e1037db48/40709_2017_64_Fig1_HTML.jpg

相似文献

A survey of methods and tools to detect recent and strong positive selection.检测近期强烈正选择的方法与工具综述。

J Biol Res (Thessalon). 2017 Apr 8;24:7. doi: 10.1186/s40709-017-0064-0. eCollection 2017 Dec.

Detecting Positive Selection in Populations Using Genetic Data.利用遗传数据检测群体中的正选择。

Methods Mol Biol. 2020;2090:87-123. doi: 10.1007/978-1-0716-0199-0_5.

SweeD: likelihood-based detection of selective sweeps in thousands of genomes.SweeD：基于似然的数千个基因组中选择性清除的检测。

Mol Biol Evol. 2013 Sep;30(9):2224-34. doi: 10.1093/molbev/mst112. Epub 2013 Jun 18.

A new inference method for detecting an ongoing selective sweep.一种用于检测正在进行的选择性清除的新推理方法。

Genes Genet Syst. 2018 Nov 10;93(4):149-161. doi: 10.1266/ggs.18-00008. Epub 2018 Sep 30.

Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models.在日益现实的进化原假设模型下评估检测反复出现的选择性清除的效能。

bioRxiv. 2023 Jun 15:2023.06.15.545166. doi: 10.1101/2023.06.15.545166.

Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data.利用群体基因组学数据检测选择信号的机器学习前景。

J Comput Biol. 2022 Sep;29(9):943-960. doi: 10.1089/cmb.2021.0447. Epub 2022 May 30.

Genetic diversity during selective sweeps in non-recombining populations.非重组群体中选择性清除过程中的遗传多样性。

bioRxiv. 2024 Sep 18:2024.09.12.612756. doi: 10.1101/2024.09.12.612756.

Soft shoulders ahead: spurious signatures of soft and partial selective sweeps result from linked hard sweeps.前方的软肩：软选择清除和部分选择清除的虚假信号源于连锁的硬选择清除。

Genetics. 2015 May;200(1):267-84. doi: 10.1534/genetics.115.174912. Epub 2015 Feb 25.

Linkage disequilibrium as a signature of selective sweeps.作为选择性清除标志的连锁不平衡。

Genetics. 2004 Jul;167(3):1513-24. doi: 10.1534/genetics.103.025387.

A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data.一种从单倍型数据中发现选择清除信号的似然方法。

Mol Biol Evol. 2020 Oct 1;37(10):3023-3046. doi: 10.1093/molbev/msaa115.

引用本文的文献

Longitudinal sequencing reveals polygenic and epistatic nature of genomic response to selection.纵向测序揭示了基因组对选择反应的多基因和上位性本质。

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2410452122. doi: 10.1073/pnas.2410452122. Epub 2025 Jun 18.

Causes of heterozygosity excess: The case of Mexican populations of .杂合性过剩的原因：墨西哥种群的案例

Plant Divers. 2024 Dec 31;47(3):415-428. doi: 10.1016/j.pld.2024.12.006. eCollection 2025 May.

Developing a crop- wild-reservoir pathogen system to understand pathogen evolution and emergence.构建一个作物-野生宿主-病原体系统以了解病原体的进化与出现。

Elife. 2025 Apr 11;14:e91245. doi: 10.7554/eLife.91245.

Fast and accurate deep learning scans for signatures of natural selection in genomes using FASTER-NN.使用FASTER-NN对基因组中的自然选择特征进行快速准确的深度学习扫描。

Commun Biol. 2025 Jan 15;8(1):58. doi: 10.1038/s42003-025-07480-7.

Repeated global adaptation across plant species.跨植物物种的反复全球适应。

Proc Natl Acad Sci U S A. 2024 Dec 24;121(52):e2406832121. doi: 10.1073/pnas.2406832121. Epub 2024 Dec 20.

Data-driven guidelines for phylogenomic analyses using SNP data.使用单核苷酸多态性（SNP）数据进行系统发育基因组分析的数据驱动指南。

Appl Plant Sci. 2024 Aug 9;12(6):e11611. doi: 10.1002/aps3.11611. eCollection 2024 Nov-Dec.

Genome-Wide Analysis of Genetic Diversity and Selection Signatures in Zaobei Beef Cattle.枣北肉牛遗传多样性与选择信号的全基因组分析

Animals (Basel). 2024 Aug 22;14(16):2447. doi: 10.3390/ani14162447.

Genome-wide analysis reveals genomic diversity and signatures of selection in Qinchuan beef cattle.全基因组分析揭示了秦川牛肉牛的基因组多样性和选择特征。

BMC Genomics. 2024 Jun 5;25(1):558. doi: 10.1186/s12864-024-10482-0.

Temporal challenges in detecting balancing selection from population genomic data.从群体基因组数据中检测平衡选择的时间挑战。

G3 (Bethesda). 2024 Jun 5;14(6). doi: 10.1093/g3journal/jkae069.

Landscape genomics reveals regions associated with adaptive phenotypic and genetic variation in Ethiopian indigenous chickens.景观基因组学揭示了与埃塞俄比亚本土鸡适应表型和遗传变异相关的区域。

BMC Genomics. 2024 Mar 18;25(1):284. doi: 10.1186/s12864-024-10193-6.

本文引用的文献

The Argonaute-binding platform of NRPE1 evolves through modulation of intrinsically disordered repeats.NRPE1的AGO结合平台通过对内在无序重复序列的调控而进化。

New Phytol. 2016 Dec;212(4):1094-1105. doi: 10.1111/nph.14089. Epub 2016 Jul 19.

Extensive local adaptation within the chemosensory system following Drosophila melanogaster's global expansion.在黑腹果蝇的全球扩张之后，化学感觉系统内发生了广泛的局部适应。

Nat Commun. 2016 Jun 13;7:ncomms11855. doi: 10.1038/ncomms11855.

SweepFinder2: increased sensitivity, robustness and flexibility.SweepFinder2：提高了灵敏度、鲁棒性和灵活性。

Bioinformatics. 2016 Jun 15;32(12):1895-7. doi: 10.1093/bioinformatics/btw051. Epub 2016 Feb 15.

An Indel Polymorphism in the MtnA 3' Untranslated Region Is Associated with Gene Expression Variation and Local Adaptation in Drosophila melanogaster.MtnA 3'非翻译区的一个插入缺失多态性与黑腹果蝇的基因表达变异和局部适应性相关。

PLoS Genet. 2016 Apr 27;12(4):e1005987. doi: 10.1371/journal.pgen.1005987. eCollection 2016 Apr.

Natural Selection and Genetic Diversity in the Butterfly Heliconius melpomene.蛱蝶科艳斑蝶中的自然选择与遗传多样性

Genetics. 2016 May;203(1):525-41. doi: 10.1534/genetics.115.183285. Epub 2016 Mar 26.

Scalable linkage-disequilibrium-based selective sweep detection: a performance guide.基于连锁不平衡的可扩展选择性清除检测：性能指南。

Gigascience. 2016 Feb 8;5:7. doi: 10.1186/s13742-016-0114-9. eCollection 2016.

Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective.利用时间序列样本表征选择性清除的方法：古DNA视角

Mol Ecol. 2016 Jan;25(1):24-41. doi: 10.1111/mec.13492.

Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems.在FPGA和GPU加速计算系统上并行化全基因组关联研究中的上位性检测

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):982-94. doi: 10.1109/TCBB.2015.2389958.

The consequences of not accounting for background selection in demographic inference.在人口统计学推断中不考虑背景选择的后果。

Mol Ecol. 2016 Jan;25(1):135-41. doi: 10.1111/mec.13390. Epub 2015 Oct 30.

The population genomics of rapid adaptation: disentangling signatures of selection and demography in white sands lizards.快速适应的群体基因组学：解析白沙蜥蜴的选择特征与种群统计学特征

Mol Ecol. 2016 Jan;25(1):306-23. doi: 10.1111/mec.13385. Epub 2015 Oct 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

检测近期强烈正选择的方法与工具综述。

A survey of methods and tools to detect recent and strong positive selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献