Suppr超能文献

利用条件变分自编码器通过大规模平行报告基因检测在计算机上生成和增强调控变异体。

In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder.

作者信息

Jin Weijia, Xia Yi, Thela Sai Ritesh, Liu Yunlong, Chen Li

机构信息

Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA.

Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

出版信息

bioRxiv. 2024 Jun 29:2024.06.25.600715. doi: 10.1101/2024.06.25.600715.

Abstract

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.

摘要

预测非编码区基因变异的功能后果是一个具有挑战性的问题。大规模平行报告基因检测(MPRAs)是一种高通量方法,它可以通过评估等位基因特异性调控活性的存在同时检测数千个变异。然而,通过MPRAs鉴定出的显示对基因表达有差异等位基因调控效应的标记变异通常仅限于数百个的规模,限制了它们作为训练集用于实现强大的全基因组预测的潜力。为了解决这一局限性,我们提出了一种深度生成模型MpraVAE,以生成并增加标记变异的训练样本量。通过在几个MPRA数据集上进行基准测试,我们证明与基线方法、传统数据增强方法以及现有变异评分方法相比,MpraVAE显著提高了对MPRA调控变异的预测性能。以自身免疫性疾病为例,我们应用MpraVAE对调控变异进行全基因组预测,发现预测的调控变异在增强子、活性组蛋白标记、免疫相关细胞类型中的开放染色质区域以及与启动子、增强子活性以及调控基因表达的cMyC和Pol II结合位点相关的染色质状态中比背景变异更富集。重要的是,通过利用染色质环和可及染色质,发现预测的调控变异与免疫相关基因相关联,证明了MpraVAE在复杂性状的遗传和基因发现中的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/7557a4425b62/nihpp-2024.06.25.600715v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验