Suppr超能文献

使用卷积神经网络推断合并时间和变体年龄。

Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks.

机构信息

Department of Statistics, University of Oxford, Oxford, UK.

Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.

出版信息

Mol Biol Evol. 2023 Oct 4;40(10). doi: 10.1093/molbev/msad211.

Abstract

Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.

摘要

准确推断个体间最近共同祖先 (TMRCA) 的时间和基因组变异的年龄是几种群体遗传学分析的关键。我们开发了一种无似然的方法,称为 CoalNN,它使用卷积神经网络从测序或 SNP 阵列数据中预测成对的 TMRCA 和等位基因年龄。CoalNN 通过模拟进行训练,并可以通过转移学习适应不同的参数,例如人口统计学历史。在几个模拟场景中,CoalNN 在预测成对 TMRCA 和等位基因年龄方面与基于模型的方法相匹配或表现更好。我们将 CoalNN 应用于基于模型的方法尚未开发的环境,并进行了分析以深入了解它用于进行 TMRCA 预测的特征集。接下来,我们使用 CoalNN 分析了来自 1000 基因组计划数据集的 26 个群体中的 2504 个样本,推断了约 8000 万个变异的年龄。我们观察到了跨群体和预测为致病性的变异的显著变化,反映了异质的人口统计学历史和负选择的作用。我们使用 CoalNN 预测的等位基因年龄构建了全基因组注释,捕捉了过去负选择的特征。我们使用来自 63 个独立复杂性状和疾病的汇总关联统计信息 (平均 N=314k) 进行了 LD 分数回归分析,观察到与之前的等位基因年龄注释相比,注释特异性对遗传力的影响增加。这些结果突出了使用无似然、模拟训练的模型来推断大型基因组数据集中基因谱系特性的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9eaf/10581698/d504e3ec5cd2/msad211f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验