SweepCluster：一种用于检测原核生物中基因特异性漂变的 SNP 聚类工具。

SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes.

机构信息

State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China.

School of Computer Science and Engineering, Guangdong Province Key Laboratory of Computational Science, and National Engineering Laboratory for Big Data Analysis and Application, Sun Yat-Sen University, Guangzhou, 510275, China.

出版信息

BMC Bioinformatics. 2022 Jan 6;23(1):19. doi: 10.1186/s12859-021-04533-6.

DOI:10.1186/s12859-021-04533-6

PMID:34991447

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8734265/

Abstract

BACKGROUND

The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations.

RESULTS

We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions.

CONCLUSION

SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.

摘要

背景

基因特异性漂变是一种选择过程，其中一个基因区域内有利的突变及其附近的中性位点会增加其在种群中的频率。它已被证明在微生物种群的生态分化或表型分化中发挥着重要作用。因此，鉴定微生物中的基因特异性漂变不仅可以深入了解进化机制，还可以揭示与生物表型相关的潜在遗传标记。然而，目前的方法主要是为检测真核生物稀疏基因型数据中的选择漂变而开发的，并不适用于原核生物数据。此外，一些方法尚未充分解决的挑战，如漂变区域的低空间分辨率以及缺乏对突变空间分布的考虑。

结果

我们提出了一种新的基于基因和空间感知的方法，用于鉴定原核生物中的基因特异性漂变，并在一个名为 SweepCluster 的 python 工具中实现了该方法。我们的方法在基因型数据中搜索具有高水平空间聚类的基因区域，假设中性选择的零分布模型。多态性的预选择基于其遗传特征，如种群划分的增加、过度连锁不平衡或显著的表型关联。使用模拟数据进行的性能评估表明，SweepCluster 中的聚类算法的灵敏度和特异性均高于 90%。将 SweepCluster 应用于来自链球菌和猪链球菌的两个真实数据集，结果表明预选择的影响显著，显著减少了无信息信号。我们使用细菌中唯一可用的基因特异性漂变的基因型数据验证了我们的方法，得到了 78%的一致性率。我们注意到，由于参考基因组和聚类策略的不同，一致性率可能被低估。对人类基因型数据集的应用表明，SweepCluster 也适用于真核生物数据，并能够恢复已知的 80%的漂变区域目录。

结论

SweepCluster 适用于广泛的数据集。它将有助于检测不同基因型数据中的基因特异性漂变，并为适应性进化提供新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be9f/8734265/1f5938df2979/12859_2021_4533_Fig1_HTML.jpg

相似文献

SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes.

BMC Bioinformatics. 2022 Jan 6;23(1):19. doi: 10.1186/s12859-021-04533-6.

Detecting Positive Selection in Populations Using Genetic Data.

Methods Mol Biol. 2020;2090:87-123. doi: 10.1007/978-1-0716-0199-0_5.

Scalable linkage-disequilibrium-based selective sweep detection: a performance guide.

Gigascience. 2016 Feb 8;5:7. doi: 10.1186/s13742-016-0114-9. eCollection 2016.

Soft shoulders ahead: spurious signatures of soft and partial selective sweeps result from linked hard sweeps.

Genetics. 2015 May;200(1):267-84. doi: 10.1534/genetics.115.174912. Epub 2015 Feb 25.

Linkage disequilibrium as a signature of selective sweeps.

Genetics. 2004 Jul;167(3):1513-24. doi: 10.1534/genetics.103.025387.

Gene-specific selective sweeps in bacteria and archaea caused by negative frequency-dependent selection.

BMC Biol. 2015 Apr 16;13:20. doi: 10.1186/s12915-015-0131-7.

Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity.

Genetics. 2018 Dec;210(4):1429-1452. doi: 10.1534/genetics.118.301502. Epub 2018 Oct 12.

SweeD: likelihood-based detection of selective sweeps in thousands of genomes.

Mol Biol Evol. 2013 Sep;30(9):2224-34. doi: 10.1093/molbev/mst112. Epub 2013 Jun 18.

A Composite-Likelihood Method for Detecting Incomplete Selective Sweep from Population Genomic Data.

Genetics. 2015 Jun;200(2):633-49. doi: 10.1534/genetics.115.175380. Epub 2015 Apr 24.

Using the variability of linkage disequilibrium between subpopulations to infer sweeps and epistatic selection in a diverse panel of chickens.

Heredity (Edinb). 2016 Feb;116(2):158-66. doi: 10.1038/hdy.2015.81. Epub 2015 Sep 9.

本文引用的文献

A Likelihood Approach for Uncovering Selective Sweep Signatures from Haplotype Data.

Mol Biol Evol. 2020 Oct 1;37(10):3023-3046. doi: 10.1093/molbev/msaa115.

Serotype and Genotype (Multilocus Sequence Type) of Streptococcus suis Isolates from the United States Serve as Predictors of Pathotype.

J Clin Microbiol. 2019 Aug 26;57(9). doi: 10.1128/JCM.00377-19. Print 2019 Sep.

Selective Sweeps.

Genetics. 2019 Jan;211(1):5-13. doi: 10.1534/genetics.118.301319.

Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity.

Genetics. 2018 Dec;210(4):1429-1452. doi: 10.1534/genetics.118.301502. Epub 2018 Oct 12.

Phenotypic differentiation of Streptococcus pyogenes populations is induced by recombination-driven gene-specific sweeps.

Sci Rep. 2016 Nov 8;6:36644. doi: 10.1038/srep36644.

Genomic Characterization of a Pattern D Streptococcus pyogenes emm53 Isolate Reveals a Genetic Rationale for Invasive Skin Tropicity.

J Bacteriol. 2016 May 27;198(12):1712-24. doi: 10.1128/JB.01019-15. Print 2016 Jun 15.

Bacterial Speciation: Genetic Sweeps in Bacterial Species.

Curr Biol. 2016 Feb 8;26(3):R112-5. doi: 10.1016/j.cub.2015.10.022.

Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations.

ISME J. 2016 Jul;10(7):1589-601. doi: 10.1038/ismej.2015.241. Epub 2016 Jan 8.

A global reference for human genetic variation.

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Microbial Speciation.

Cold Spring Harb Perspect Biol. 2015 Sep 9;7(10):a018143. doi: 10.1101/cshperspect.a018143.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SweepCluster：一种用于检测原核生物中基因特异性漂变的 SNP 聚类工具。

SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes.

机构信息