• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于变分自动编码器的基因组数据插补。

Genomic data imputation with variational auto-encoders.

机构信息

Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA 94305, USA.

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa082.

DOI:10.1093/gigascience/giaa082
PMID:32761097
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7407276/
Abstract

BACKGROUND

As missing values are frequently present in genomic data, practical methods to handle missing data are necessary for downstream analyses that require complete data sets. State-of-the-art imputation techniques, including methods based on singular value decomposition and K-nearest neighbors, can be computationally expensive for large data sets and it is difficult to modify these algorithms to handle certain cases not missing at random.

RESULTS

In this work, we use a deep-learning framework based on the variational auto-encoder (VAE) for genomic missing value imputation and demonstrate its effectiveness in transcriptome and methylome data analysis. We show that in the vast majority of our testing scenarios, VAE achieves similar or better performances than the most widely used imputation standards, while having a computational advantage at evaluation time. When dealing with data missing not at random (e.g., few values are missing), we develop simple yet effective methodologies to leverage the prior knowledge about missing data. Furthermore, we investigate the effect of varying latent space regularization strength in VAE on the imputation performances and, in this context, show why VAE has a better imputation capacity compared to a regular deterministic auto-encoder.

CONCLUSIONS

We describe a deep learning imputation framework for transcriptome and methylome data using a VAE and show that it can be a preferable alternative to traditional methods for data imputation, especially in the setting of large-scale data and certain missing-not-at-random scenarios.

摘要

背景

由于基因组数据中经常存在缺失值,因此对于需要完整数据集的下游分析,需要实用的方法来处理缺失数据。最先进的缺失值插补技术,包括基于奇异值分解和 K-最近邻的方法,对于大型数据集来说计算成本很高,并且很难修改这些算法来处理某些非随机缺失的情况。

结果

在这项工作中,我们使用基于变分自动编码器(VAE)的深度学习框架进行基因组缺失值插补,并证明其在转录组和甲基组数据分析中的有效性。我们表明,在绝大多数测试场景中,VAE 的性能与最广泛使用的插补标准相似或更好,而在评估时具有计算优势。当处理非随机缺失的数据(例如,少数值缺失)时,我们开发了简单而有效的方法来利用关于缺失数据的先验知识。此外,我们研究了 VAE 中潜在空间正则化强度对插补性能的影响,并在这种情况下,说明了为什么 VAE 比常规确定性自动编码器具有更好的插补能力。

结论

我们描述了一种使用 VAE 的转录组和甲基组数据的深度学习插补框架,并表明它可以替代传统的数据插补方法,尤其是在大规模数据和某些非随机缺失情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/4227d97aec31/giaa082fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/5ee9bf71c760/giaa082fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/5783fb2c8d85/giaa082fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/ec51cffc178d/giaa082fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/0a58e98925ff/giaa082fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/4227d97aec31/giaa082fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/5ee9bf71c760/giaa082fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/5783fb2c8d85/giaa082fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/ec51cffc178d/giaa082fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/0a58e98925ff/giaa082fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/7407276/4227d97aec31/giaa082fig5.jpg

相似文献

1
Genomic data imputation with variational auto-encoders.基于变分自动编码器的基因组数据插补。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa082.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
Decoding regulatory structures and features from epigenomics profiles: A Roadmap-ENCODE Variational Auto-Encoder (RE-VAE) model.从表观基因组学图谱中解码调控结构和特征:路线图-ENCODE 变分自动编码器 (RE-VAE) 模型。
Methods. 2021 May;189:44-53. doi: 10.1016/j.ymeth.2019.10.012. Epub 2019 Oct 28.
4
Unsupervised Phonocardiogram Analysis With Distribution Density Based Variational Auto-Encoders.基于分布密度变分自编码器的无监督心音图分析
Front Med (Lausanne). 2021 Aug 5;8:655084. doi: 10.3389/fmed.2021.655084. eCollection 2021.
5
Supervised Multi-Layer Conditional Variational Auto-Encoder for Process Modeling and Soft Sensor.用于过程建模和软传感器的监督式多层条件变分自编码器
Sensors (Basel). 2023 Nov 14;23(22):9175. doi: 10.3390/s23229175.
6
Missing value imputation for gene expression data by tailored nearest neighbors.通过定制最近邻算法对基因表达数据进行缺失值插补
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.
7
Imputation of unordered markers and the impact on genomic selection accuracy.无序标记的推断及其对基因组选择准确性的影响。
G3 (Bethesda). 2013 Mar;3(3):427-39. doi: 10.1534/g3.112.005363. Epub 2013 Mar 1.
8
Missing data imputation and sensor self-validation towards a sustainable operation of wastewater treatment plants via deep variational residual autoencoders.通过深度变分残差自动编码器实现污水处理厂可持续运行的缺失数据插补和传感器自验证。
Chemosphere. 2022 Feb;288(Pt 3):132647. doi: 10.1016/j.chemosphere.2021.132647. Epub 2021 Oct 23.
9
Reliable Fault Diagnosis of Bearings Using an Optimized Stacked Variational Denoising Auto-Encoder.基于优化堆叠变分去噪自动编码器的轴承可靠故障诊断
Entropy (Basel). 2021 Dec 24;24(1):36. doi: 10.3390/e24010036.
10
Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.在多重填补框架内使用聚类和深度学习进行缺失值估计
Knowl Based Syst. 2022 Aug 5;249. doi: 10.1016/j.knosys.2022.108968. Epub 2022 May 10.

引用本文的文献

1
Exposure-inducible genes may contribute to missingness in RNAseq-based gene expression analyses.暴露诱导基因可能导致基于RNA测序的基因表达分析中出现数据缺失。
Sci Rep. 2025 Aug 22;15(1):30889. doi: 10.1038/s41598-025-14395-0.
2
A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis.关于扩散模型在微观图像和类微观图像分析中的应用的最新综述。
Front Med (Lausanne). 2025 Jul 16;12:1551894. doi: 10.3389/fmed.2025.1551894. eCollection 2025.
3
One-sample missing DNA-methylation value imputation.

本文引用的文献

1
scVAE: variational auto-encoders for single-cell gene expression data.scVAE:用于单细胞基因表达数据的变分自动编码器。
Bioinformatics. 2020 Aug 15;36(16):4415-4422. doi: 10.1093/bioinformatics/btaa293.
2
Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples.用于癌症样本 RNA 测序的长非编码 RNA 定量基准测试。
Gigascience. 2019 Dec 1;8(12). doi: 10.1093/gigascience/giz145.
3
DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data.DeepImpute:一种准确、快速且可扩展的深度学习神经网络方法,用于填补单细胞 RNA-seq 数据。
单样本缺失DNA甲基化值插补
BMC Bioinformatics. 2025 May 31;26(1):143. doi: 10.1186/s12859-025-06154-9.
4
Imputing single-cell protein abundance in multiplex tissue imaging.在多重组织成像中估算单细胞蛋白质丰度
Nat Commun. 2025 May 22;16(1):4747. doi: 10.1038/s41467-025-59788-x.
5
CMImpute: cross-species and tissue imputation of species-level DNA methylation samples across mammalian species.CMImpute:跨哺乳动物物种的物种水平DNA甲基化样本的跨物种和组织插补
Genome Biol. 2025 May 20;26(1):133. doi: 10.1186/s13059-025-03561-2.
6
Flexible imputation toolkit for electronic health records.用于电子健康记录的灵活插补工具包。
Sci Rep. 2025 May 17;15(1):17176. doi: 10.1038/s41598-025-02276-5.
7
A Robust Multivariate Time Series Classification Approach Based on Topological Data Analysis for Channel Fault Tolerance.一种基于拓扑数据分析的用于通道容错的稳健多变量时间序列分类方法。
Sensors (Basel). 2025 Apr 24;25(9):2709. doi: 10.3390/s25092709.
8
Weighted-VAE: A deep learning approach for multimodal data generation applied to experimental T. cruzi infection.加权变分自编码器:一种应用于克氏锥虫实验性感染的多模态数据生成的深度学习方法。
PLoS One. 2025 Mar 24;20(3):e0315843. doi: 10.1371/journal.pone.0315843. eCollection 2025.
9
Unsupervised data imputation with multiple importance sampling variational autoencoders.使用多重重要性采样变分自编码器进行无监督数据插补
Sci Rep. 2025 Jan 27;15(1):3409. doi: 10.1038/s41598-025-87641-0.
10
AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA.蛋白质组学数据的增强双稳健插补后推断
bioRxiv. 2025 Jan 19:2024.03.23.586387. doi: 10.1101/2024.03.23.586387.
Genome Biol. 2019 Oct 18;20(1):211. doi: 10.1186/s13059-019-1837-6.
4
Missing-Values Imputation Algorithms for Microarray Gene Expression Data.用于微阵列基因表达数据的缺失值插补算法
Methods Mol Biol. 2019;1986:255-266. doi: 10.1007/978-1-4939-9442-7_12.
5
Single-cell RNA-seq denoising using a deep count autoencoder.基于深度计数自编码器的单细胞 RNA-seq 去噪。
Nat Commun. 2019 Jan 23;10(1):390. doi: 10.1038/s41467-018-07931-2.
6
Deep generative modeling for single-cell transcriptomics.单细胞转录组学的深度生成模型。
Nat Methods. 2018 Dec;15(12):1053-1058. doi: 10.1038/s41592-018-0229-2. Epub 2018 Nov 30.
7
Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.机器学习鉴定与致癌去分化相关的干性特征。
Cell. 2018 Apr 5;173(2):338-354.e15. doi: 10.1016/j.cell.2018.03.034.
8
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas.基因组、通路网络和免疫特征区分鳞状细胞癌。
Cell Rep. 2018 Apr 3;23(1):194-212.e6. doi: 10.1016/j.celrep.2018.03.063.
9
Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response.模块分析捕获了与吸烟和抗病毒反应相关的泛癌症遗传和表观遗传失调的癌症驱动基因。
EBioMedicine. 2018 Jan;27:156-166. doi: 10.1016/j.ebiom.2017.11.028. Epub 2017 Dec 1.
10
The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.不同插补方法在保留癌症中显著基因和通路方面的能力。
Genomics Proteomics Bioinformatics. 2017 Dec;15(6):396-404. doi: 10.1016/j.gpb.2017.08.003. Epub 2017 Dec 13.