Department of Genetics, Stanford University, Stanford, USA.
Department of Bioengineering, University of Washington, Seattle, USA.
Genome Biol. 2022 Nov 5;23(1):232. doi: 10.1186/s13059-022-02799-4.
3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging.
We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells.
A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.
通过切割和多聚腺苷酸化进行 3'-末端加工是 mRNA 成熟过程中一个重要且精细调节的调控过程。许多遗传变异已知通过破坏多聚腺苷酸化信号的顺式调控密码而导致或促成人类疾病。然而,由于该密码的复杂性,变异解释仍然具有挑战性。
我们引入了一个残差神经网络模型 APARENT2,该模型可以比以前的任何模型更准确地从 DNA 序列推断 3'-切割和多聚腺苷酸化。该模型可推广到可变数量的多聚腺苷酸化信号的替代多聚腺苷酸化(APA)情况。我们在几个变体数据集上展示了 APARENT2 的性能,包括功能性报告基因数据和 GTEx 中的人类 3' aQTL。我们应用神经网络解释方法来深入了解多聚腺苷酸化的破坏或保护的高阶特征。我们在人类组织解析转录组数据上对 APARENT2 进行微调,以阐明组织特异性变体效应。通过将 APARENT2 与 mRNA 稳定性模型相结合,我们将 aQTL 效应大小预测扩展到整个 3'非翻译区。最后,我们对所有人类多聚腺苷酸化信号进行了计算机模拟饱和诱变,并将 [Formula: see text] 百万个变体的预测效应与 gnomAD 进行了比较。虽然失活功能的变异通常被选择,但我们也发现了与获得性功能突变相关的特定临床情况。例如,我们在自闭症谱系障碍中检测到 3'末端获得性功能突变与疾病的关联。为了实验验证 APARENT2 的预测,我们在包括小神经胶质衍生细胞在内的多个细胞系中检测了临床相关变体。
基于深度残差学习的序列到功能模型能够对多聚腺苷酸化信号中的遗传变异进行准确的功能解释,并且当与大型人类变异数据库结合使用时,可以阐明功能 3'-末端突变与人类健康之间的联系。