• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用条件变分自编码器通过大规模平行报告基因检测在计算机上生成和增强调控变异体。

In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder.

作者信息

Jin Weijia, Xia Yi, Thela Sai Ritesh, Liu Yunlong, Chen Li

机构信息

Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA.

Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

出版信息

bioRxiv. 2024 Jun 29:2024.06.25.600715. doi: 10.1101/2024.06.25.600715.

DOI:10.1101/2024.06.25.600715
PMID:38979263
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11230389/
Abstract

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.

摘要

预测非编码区基因变异的功能后果是一个具有挑战性的问题。大规模平行报告基因检测(MPRAs)是一种高通量方法,它可以通过评估等位基因特异性调控活性的存在同时检测数千个变异。然而,通过MPRAs鉴定出的显示对基因表达有差异等位基因调控效应的标记变异通常仅限于数百个的规模,限制了它们作为训练集用于实现强大的全基因组预测的潜力。为了解决这一局限性,我们提出了一种深度生成模型MpraVAE,以生成并增加标记变异的训练样本量。通过在几个MPRA数据集上进行基准测试,我们证明与基线方法、传统数据增强方法以及现有变异评分方法相比,MpraVAE显著提高了对MPRA调控变异的预测性能。以自身免疫性疾病为例,我们应用MpraVAE对调控变异进行全基因组预测,发现预测的调控变异在增强子、活性组蛋白标记、免疫相关细胞类型中的开放染色质区域以及与启动子、增强子活性以及调控基因表达的cMyC和Pol II结合位点相关的染色质状态中比背景变异更富集。重要的是,通过利用染色质环和可及染色质,发现预测的调控变异与免疫相关基因相关联,证明了MpraVAE在复杂性状的遗传和基因发现中的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/feacabc45a1f/nihpp-2024.06.25.600715v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/7557a4425b62/nihpp-2024.06.25.600715v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/a32bd711c0d7/nihpp-2024.06.25.600715v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/8dc6f8fc1760/nihpp-2024.06.25.600715v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/7ff64d7c9a97/nihpp-2024.06.25.600715v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/feacabc45a1f/nihpp-2024.06.25.600715v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/7557a4425b62/nihpp-2024.06.25.600715v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/a32bd711c0d7/nihpp-2024.06.25.600715v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/8dc6f8fc1760/nihpp-2024.06.25.600715v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/7ff64d7c9a97/nihpp-2024.06.25.600715v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7530/11230389/feacabc45a1f/nihpp-2024.06.25.600715v1-f0005.jpg

相似文献

1
In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder.利用条件变分自编码器通过大规模平行报告基因检测在计算机上生成和增强调控变异体。
bioRxiv. 2024 Jun 29:2024.06.25.600715. doi: 10.1101/2024.06.25.600715.
2
QuASAR-MPRA: accurate allele-specific analysis for massively parallel reporter assays.QuASAR-MPRA:用于大规模平行报告分析的精确等位基因特异性分析。
Bioinformatics. 2018 Mar 1;34(5):787-794. doi: 10.1093/bioinformatics/btx598.
3
Massively parallel reporter assays and mouse transgenic assays provide complementary information about neuronal enhancer activity.大规模平行报告基因检测和小鼠转基因检测提供了关于神经元增强子活性的互补信息。
bioRxiv. 2024 Apr 23:2024.04.22.590634. doi: 10.1101/2024.04.22.590634.
4
Deep learning-assisted genome-wide characterization of massively parallel reporter assays.深度学习辅助的大规模平行报告基因检测的全基因组特征分析。
Nucleic Acids Res. 2022 Nov 11;50(20):11442-11454. doi: 10.1093/nar/gkac990.
5
Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays.利用大规模平行报告基因实验的神经网络模型来破译调控 DNA 序列和非编码遗传变异。
PLoS One. 2019 Jun 17;14(6):e0218073. doi: 10.1371/journal.pone.0218073. eCollection 2019.
6
Massively parallel reporter assays and variant scoring identified functional variants and target genes for melanoma loci and highlighted cell-type specificity.大规模平行报告基因检测和变异评分鉴定了黑素瘤基因座的功能变异和靶基因,并突出了细胞类型特异性。
Am J Hum Genet. 2022 Dec 1;109(12):2210-2229. doi: 10.1016/j.ajhg.2022.11.006. Epub 2022 Nov 23.
7
Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay.利用大规模平行报告基因检测系统分析 2000 个预测的人类增强子中的调控基序。
Genome Res. 2013 May;23(5):800-11. doi: 10.1101/gr.144899.112. Epub 2013 Mar 19.
8
Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants.精神分裂症相关常见变异等位基因调控活性的系统研究。
Cell Genom. 2023 Sep 15;3(10):100404. doi: 10.1016/j.xgen.2023.100404. eCollection 2023 Oct 11.
9
Identification of Functional Variants in the FAM13A Chronic Obstructive Pulmonary Disease Genome-Wide Association Study Locus by Massively Parallel Reporter Assays.通过大规模平行报告分析鉴定 FAM13A 慢性阻塞性肺疾病全基因组关联研究位点中的功能变异。
Am J Respir Crit Care Med. 2019 Jan 1;199(1):52-61. doi: 10.1164/rccm.201802-0337OC.
10
Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.大规模平行报告基因分析的荟萃分析能够预测跨细胞类型的调节功能。
Hum Mutat. 2019 Sep;40(9):1299-1313. doi: 10.1002/humu.23820. Epub 2019 Jun 18.

本文引用的文献

1
Deep5hmC: predicting genome-wide 5-hydroxymethylcytosine landscape via a multimodal deep learning model.Deep5hmC:通过多模态深度学习模型预测全基因组 5-羟甲基胞嘧啶景观。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae528.
2
Tissue-specific enhancer-gene maps from multimodal single-cell data identify causal disease alleles.多模态单细胞数据的组织特异性增强子-基因图谱确定因果疾病等位基因。
Nat Genet. 2024 Apr;56(4):615-626. doi: 10.1038/s41588-024-01682-1. Epub 2024 Apr 9.
3
phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data.
phylaGAN:通过条件 GAN 和自动编码器进行数据增强,以改善使用微生物组数据进行疾病预测的准确性。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae161.
4
COSMIC: a curated database of somatic variants and clinical data for cancer.COSMIC:一个针对癌症体细胞变异和临床数据的精选数据库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1210-D1217. doi: 10.1093/nar/gkad986.
5
Medical image data augmentation: techniques, comparisons and interpretations.医学图像数据增强:技术、比较与解读
Artif Intell Rev. 2023 Mar 20:1-45. doi: 10.1007/s10462-023-10453-z.
6
Semi-supervised learning improves regulatory sequence prediction with unlabeled sequences.半监督学习利用未标记序列提高调控序列预测。
BMC Bioinformatics. 2023 May 5;24(1):186. doi: 10.1186/s12859-023-05303-2.
7
DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation.DeepMicroGen:一种基于生成对抗网络的纵向微生物组数据插补方法。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad286.
8
TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions.TIVAN-indel:一种注释和预测非编码调控小插入和缺失的计算框架。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad060.
9
DeepPHiC: predicting promoter-centered chromatin interactions using a novel deep learning approach.DeepPHiC:使用新型深度学习方法预测以启动子为中心的染色质相互作用。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac801.
10
bmVAE: a variational autoencoder method for clustering single-cell mutation data.基于变分自编码器的单细胞突变聚类方法。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac790.