Suppr超能文献

AE-TPGG:一种基于自动编码器的用于单细胞RNA测序数据插补和降维的新方法。

AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction.

作者信息

Zhao Shuchang, Zhang Li, Liu Xuejun

机构信息

MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106 China.

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023 China.

出版信息

Front Comput Sci (Berl). 2023;17(3):173902. doi: 10.1007/s11704-022-2011-y. Epub 2022 Oct 26.

Abstract

UNLABELLED

Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous populations. Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material, the technical noise and biological variation are inevitably introduced into experimental process, resulting in high dropout events, which greatly hinder the downstream analysis. Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data, we propose a customized autoencoder based on a two-part-generalized-gamma distribution (AE-TPGG) for scRNA-seq data analysis, which takes mixed discrete-continuous random variables of scRNA-seq data into account using a two-part model and utilizes the generalized gamma (GG) distribution, for fitting the positive and right-skewed continuous data. The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes. In addition to the ability of achieving low-dimensional representation, the AE-TPGG model also provides a denoised imputation according to statistical characteristic of gene expression. Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses.

ELECTRONIC SUPPLEMENTARY MATERIAL

Supplementary material is available in the online version of this article at 10.1007/s11704-022-2011-y.

摘要

未标注

单细胞RNA测序(scRNA-seq)技术已成为高通量转录组学研究的有效工具,它规避了与批量RNA-seq技术相对应的平均假象,为潜在表面均匀群体的细胞多样性带来了新的视角。尽管各种测序技术已经降低了扩增偏差并提高了因起始材料量少而导致的捕获效率,但技术噪声和生物学变异不可避免地被引入实验过程中,导致高缺失事件,这极大地阻碍了下游分析。考虑到标准化scRNA-seq数据中存在的双峰表达模式和右偏特征,我们提出了一种基于两部分广义伽马分布(AE-TPGG)的定制自动编码器用于scRNA-seq数据分析,该方法使用两部分模型考虑scRNA-seq数据的混合离散-连续随机变量,并利用广义伽马(GG)分布来拟合正的和右偏的连续数据。所采用的自动编码器使AE-TPGG能够捕捉基因之间的内在关系。除了能够实现低维表示外,AE-TPGG模型还根据基因表达的统计特征提供去噪插补。真实数据集的结果表明,我们提出的模型与当前的插补方法相比具有竞争力,并改善了各种典型的scRNA-seq数据分析。

电子补充材料

补充材料可在本文的在线版本中获取,链接为10.1007/s11704-022-2011-y。

相似文献

本文引用的文献

7
SAVER: gene expression recovery for single-cell RNA sequencing.SAVER:单细胞 RNA 测序的基因表达恢复。
Nat Methods. 2018 Jul;15(7):539-542. doi: 10.1038/s41592-018-0033-z. Epub 2018 Jun 25.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验