Suppr超能文献

通过基因组汇总统计的光谱分析揭示自然选择的足迹。

Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics.

机构信息

Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA.

出版信息

Mol Biol Evol. 2023 Jul 5;40(7). doi: 10.1093/molbev/msad157.

Abstract

Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.

摘要

自然选择在基因组上留下了一个空间模式,在选择的基因座附近存在着单倍型分布的扭曲,这种扭曲随着距离的增加而逐渐消失。评估群体遗传综合统计量在基因组上的空间信号,可以将自然选择的模式与中性模式区分开来。考虑多个综合统计量的基因组空间分布,有望帮助揭示选择的微妙特征。近年来,已经设计出了许多方法,利用经典的机器学习和深度学习架构,考虑了综合统计量的基因组空间分布。然而,通过改进从这些综合统计量中提取特征的方式,可能可以获得更好的预测结果。我们应用小波变换、多谱线谱分析和 S 变换来实现这一目标,将一维综合统计数组转换为谱分析的二维图像,从而可以同时进行时间和谱分析。我们将这些图像输入卷积神经网络,并考虑使用集成堆叠来组合模型。我们的建模框架在各种进化环境中都具有很高的准确性和功效,包括种群大小的变化以及具有不同强度、柔软度和时间的测试集。对中欧全基因组序列的扫描很好地重现了已确立的扫描候选者,并预测了一些癌症相关基因作为具有高支持的扫描。鉴于该建模框架也能很好地处理缺失的基因组片段,我们相信它将成为从基因组数据中学习适应性过程的群体基因组工具包的一个受欢迎的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a447/10365025/938ab894b11d/msad157f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验