Suppr超能文献

利用深度神经网络区分近期平衡选择和不完全清除。

Distinguishing between recent balancing selection and incomplete sweep using deep neural networks.

机构信息

Department of Biological Sciences, Middle East Technical University, Ankara, Turkey.

Laboratory of Medical Genetics, Department of Biomedical Sciences and Human Oncology, Università degli Studi di Bari Aldo Moro, Bari, Italy.

出版信息

Mol Ecol Resour. 2021 Nov;21(8):2706-2718. doi: 10.1111/1755-0998.13379. Epub 2021 Apr 5.

Abstract

Balancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study, we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-in-time simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false-positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to familial Mediterranean fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterize signals of selection on intermediate frequency variants, an analysis currently inaccessible by commonly used strategies.

摘要

平衡选择是支撑广泛表型的一个重要适应机制。尽管其相关性很重要,但从基因组数据中检测到最近的平衡选择是具有挑战性的,因为其特征与正在进行的正选择留下的特征本质上是相似的。在这项研究中,我们开发并实施了两个深度神经网络,并测试了它们从群体基因组数据中预测由于平衡选择或不完全清除而导致的近期选择下的基因座的性能。具体来说,我们进行了向前时间模拟,以训练和测试人工神经网络 (ANN) 和卷积神经网络 (CNN)。ANN 接收作为输入的是在感兴趣的基因座上计算的多个汇总统计数据,而 CNN 则直接应用于单倍型矩阵。我们发现,这两种架构都具有很高的准确性,可以识别最近选择下的基因座。CNN 通常比 ANN 更能区分平衡选择和不完全清除的信号,并且受错误训练数据的影响较小。我们将这两个经过训练的网络部署在欧洲人群的中性基因组区域,并证明 CNN 的假阳性率低于 ANN。我们最后在 MEFV 基因区域内部署了 CNN,并在欧洲人群中鉴定出了几个预测为不完全清除的常见变体。值得注意的是,其中两个变体是功能变化,可能是过去适应病原体的结果,从而可能调节家族性地中海热的易感性。总之,深度神经网络能够对中频变异的选择信号进行特征化,这是目前常用策略无法进行的分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验