Faculty of EEMCS, University of Twente, Enschede, The Netherlands.
Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece.
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i194-i203. doi: 10.1093/bioinformatics/btad265.
Recent methods for selective sweep detection cast the problem as a classification task and use summary statistics as features to capture region characteristics that are indicative of a selective sweep, thereby being sensitive to confounding factors. Furthermore, they are not designed to perform whole-genome scans or to estimate the extent of the genomic region that was affected by positive selection; both are required for identifying candidate genes and the time and strength of selection.
We present ASDEC (https://github.com/pephco/ASDEC), a neural-network-based framework that can scan whole genomes for selective sweeps. ASDEC achieves similar classification performance to other convolutional neural network-based classifiers that rely on summary statistics, but it is trained 10× faster and classifies genomic regions 5× faster by inferring region characteristics from the raw sequence data directly. Deploying ASDEC for genomic scans achieved up to 15.2× higher sensitivity, 19.4× higher success rates, and 4× higher detection accuracy than state-of-the-art methods. We used ASDEC to scan human chromosome 1 of the Yoruba population (1000Genomes project), identifying nine known candidate genes.
最近用于选择性清除检测的方法将问题转化为分类任务,并使用汇总统计信息作为特征来捕获指示选择性清除的区域特征,从而容易受到混杂因素的影响。此外,它们不是为执行全基因组扫描或估计受正选择影响的基因组区域的程度而设计的;这两者都是识别候选基因以及选择的时间和强度所必需的。
我们提出了 ASDEC(https://github.com/pephco/ASDEC),这是一种基于神经网络的框架,可用于对选择性清除进行全基因组扫描。ASDEC 实现了与其他依赖汇总统计信息的基于卷积神经网络的分类器类似的分类性能,但它的训练速度快 10 倍,通过直接从原始序列数据推断区域特征,对基因组区域的分类速度快 5 倍。在基因组扫描中部署 ASDEC 可实现比最先进方法高 15.2 倍的灵敏度、19.4 倍的成功率和 4 倍的检测准确性。我们使用 ASDEC 扫描了 1000Genomes 项目中的约鲁巴人群的 1 号染色体,鉴定出了九个已知的候选基因。