Suppr超能文献

使用卷积神经网络检测种间正选择

Detecting Interspecific Positive Selection Using Convolutional Neural Networks.

作者信息

West Charlotte, Walker Conor R, Arasti Shayesteh, Vasilev Viacheslav, Xu Xingze, De Maio Nicola, Goldman Nick

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.

Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.

出版信息

Mol Biol Evol. 2025 Jul 1;42(7). doi: 10.1093/molbev/msaf154.

Abstract

Traditional statistical methods using maximum likelihood and Bayesian inference can detect positive selection from an interspecific phylogeny and a codon sequence alignment based on model assumptions, but they are prone to false positives due to alignment errors and can lack power. These problems are particularly pronounced when faced with high levels of indels and divergence. To address these issues, we trained and tested convolutional neural network models on simulated data and achieved higher accuracy in detecting selection across a specific range of phylogenetic scenarios and evolutionary modes. This advantage is particularly evident when performing inference on noisy data prone to misalignments. Our method shows some ability to account for these errors, where most statistical frameworks fail to do so in a tractable manner. We explore the generalizability of our convolutional neural network models to unseen evolutionary scenarios and identify future avenues to achieve broader utility. Once trained, our convolutional neural network model is faster at test time, making it a scalable alternative to traditional statistical methods for large-scale, multigene analyses. In addition to binary classification (inference of the presence or absence of positive selection during the evolution of the sequences), we use saliency maps to understand what the model learns and observe how this could be leveraged for sitewise inference of positive selection.

摘要

使用最大似然法和贝叶斯推理的传统统计方法可以基于模型假设从种间系统发育和密码子序列比对中检测正选择,但由于比对错误,它们容易出现假阳性,并且可能缺乏效力。当面对高水平的插入缺失和分歧时,这些问题尤为突出。为了解决这些问题,我们在模拟数据上训练和测试了卷积神经网络模型,并在检测特定系统发育场景和进化模式范围内的选择时取得了更高的准确率。当对容易出现错配的噪声数据进行推断时,这种优势尤为明显。我们的方法显示出一些处理这些错误的能力,而大多数统计框架难以做到以一种易于处理的方式来处理。我们探索了卷积神经网络模型对未见过的进化场景的通用性,并确定了实现更广泛应用的未来途径。一旦训练完成,我们的卷积神经网络模型在测试时速度更快,使其成为大规模多基因分析中传统统计方法的可扩展替代方案。除了二元分类(推断序列进化过程中是否存在正选择),我们还使用显著性图来了解模型学到了什么,并观察如何利用它进行正选择的位点推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1c7/12287699/d3de7c8950bf/msaf154f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验