Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
Sci Rep. 2021 May 19;11(1):10606. doi: 10.1038/s41598-021-89904-y.
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
等位基因特异性表达 (ASE) 关注的是替代等位基因的不同表达量,通过 RNA 测序进行测量。多项研究表明,ASE 通过调节外显率或表型严重程度在遗传性疾病中发挥作用。然而,基因组诊断基于 DNA 测序,因此忽略了基因表达调控,如 ASE。为了在没有 RNA 测序的情况下利用 ASE,必须仅使用 DNA 变异来预测。我们使用 BIOS(n = 3432)和 GTEx(n = 369)中的数据构建了 ASE 模型,这些模型使用 DNA 特征来预测 ASE。这些模型具有高度的可重复性,包含许多不同的特征类型,突出了 ASE 所基于的复杂调控机制。我们将 BIOS 训练的模型应用于三个基因中的群体变异,这些基因中的 ASE 具有临床相关作用:BRCA2、RET 和 NF1。这导致预测了 27 个变体的 ASE 效应,其中 10 个是已知的致病性变体。我们证明可以使用机器学习从 DNA 特征预测 ASE。未来的研究可能会提高敏感性,并将这些模型转化为一种新的基因组诊断工具,优先考虑候选致病性变体或其 RNA 测序验证的调节剂。所有使用的代码和机器学习模型都可以在 GitHub 和 Zenodo 上获得。